| 
      
                
. This paper is
                part of a project that has received funding from the French
                “Agence de l'innovation de défense”.
              
                
. This article has
                been written using GNU TeXmacs [25].
              
In this paper, we present a probabilistic algorithm to multiply two sparse polynomials almost as efficiently as two dense univariate polynomials with a result of approximately the same size. The algorithm depends on unproven heuristics that will be made precise. Non-heuristic versions that are a constant times slower are also presented.  | 
        
      Let 
 be polynomials that are represented in the
      usual way as linear combinations of power products. The problem of
      sparse polynomial multiplication is to compute the product 
 in a way that is as efficient as possible in terms of
      the total bitsize of 
, 
, and 
 (and
      where we use a similar sparse representation for 
 as for 
 and 
).
    
For pedagogical reasons, we mainly restrict our attention to polynomials with integer coefficients. Together with polynomials with rational coefficients, this is indeed the most important case for practical implementations inside computer algebra systems. Nevertheless, it is not hard to adapt our techniques to coefficients in more general rings (some indications to that effect are given in section 5.2). Still for pedagogical reasons, we will carry out our complexity analysis in the RAM model [12]. We expect our algorithms to adapt to the Turing model [40], but more work will be needed to prove this and some of the constant factors might deteriorate.
For polynomials of modest size, naive algorithms are often most efficient. We refer to [4, 7, 11, 21, 32, 37, 38, 45] for implementation techniques that are efficient in practice. Various types of faster algorithms have been proposed for polynomials with special supports [18, 20, 26, 41].
      Asymptotically fast methods for polynomials of large sizes usually rely
      on sparse interpolation. The seminal paper by Ben Or and Tiwari [5] triggered the development of many fast algorithms for the
      sparse interpolation of polynomial blackbox functions [1,
      6, 10, 22, 27–29, 31, 33, 39]. In
      this framework, the unknown polynomial 
 is given
      through a blackbox function that can be evaluated at points in suitable
      extensions of the coefficient ring. We refer to [42] for a
      nice survey on sparse interpolation and other algorithms to compute with
      sparse polynomials. The present paper grew out of our recent preprint
      [24] with Grégoire Lecerf on this topic; the idea to
      “exploit colliding terms” in section 6.6 forms the starting
      point of this paper. We refer to [24] for more references
      and will also refer to that paper at several places for more background
      information.
    
The most efficient algorithms for sparse interpolation are mostly probabilistic. Here we note that it is usually easy to check that the result is correct with high probability: just evaluate both the blackbox function and its supposed interpolation at a random point and verify that both evaluations coincide. In this paper, all algorithms will be probabilistic, which is suitable for the practical purposes that we are interested in. The running times of our algorithms also rely on suitable heuristics that we will make precise.
      Although the multiplication problem for sparse polynomials does not
      directly fit into the usual blackbox model, it does benefit from the
      techniques that have been developed for sparse interpolation. Practical
      algorithms along these lines have appeared in [9, 15,
      21, 35]. Most algorithms operate in two
      phases: we first need to determine the exponents of the product 
 and then its coefficients. The first phase is
      typically more expensive when the coefficients of 
      are small, but it becomes cheap for large coefficients, due to the fact
      that we may first reduce 
 modulo a suitable
      prime. It is also customary to distinguish between the supersparse case
      in which the total degree of 
 is allowed to
      become huge, the normally sparse case in which the total degree remains
      small, and the weakly sparse case when the total degree is very small
      with respect to the number of terms. In this paper, we mainly focus on
      the last asymptotic regime, which is most important for practical
      applications with a large number of variables.
    
      In order to describe the complexity results, let us introduce some
      notations. Given a polynomial 
,
      we will write 
 for its total degree, 
 for its number of terms, 
 for the
      number of powers 
 with 
      that occur in its representation (identical powers being counted
      multiple times), and 
 for the maximal absolute
      value of a coefficient. For instance, if 
,
      then we have 
, 
, 
,
      and 
. For our multiplication
      problem 
, the degree 
 of the result is easily determined, but we usually
      only assume an upper bound 
 with 
      for its number of terms.
    
It is interesting to keep track of the dependency of our running times on logarithmic factors in certain parameters, but it is also convenient to ignore less important logarithmic and sublogarithmic factors. We do this by introducing the notation
    
      We also wish to compare the cost of our algorithms with the cost of
      multiplying dense univariate polynomials of approximately the same size.
      Given integers 
, we therefore
      also introduce the following two complexities:
    
          
 stands for the cost of multiplying two
          non-zero polynomials in 
 under the assumption
          that the product 
 satisfies 
.
        
          
 stands for the cost of multiplying two
          polynomials in 
.
        
      We make the customary assumption that 
 and 
 are non-decreasing as functions in 
. By [16], one may take 
. If 
,
      then one also has 
, using
      Kronecker substitution [12].
    
      One traditional approach for sparse polynomial multiplication is to
      evaluate 
, 
, and 
 at 
 points in a geometric progression 
      modulo a sufficiently large prime number 
,
      where 
 stands for the 
-th prime number. In combination with the tangent
      Graeffe method [24, sections 5 and 7.2] and fast smooth
      factoring (see, e.g. Lemma 5 below), this
      approach allows the exponents of 
 to be computed
      in time
    
![]()  | 
        (1) | 
The coefficients can be recovered using fast Vandermonde system solving, in time
![]()  | 
        (2) | 
      where 
. In our case when 
 is small, we usually have 
, in which case (1) simplifies into
      
. The dependence of the
      complexity on 
 can be reduced using techniques
      from [29], among others [22].
    
      The main results of this paper are two faster probabilistic algorithms.
      The shorter running times rely on two heuristics HE and
      HC that will be detailed in section 4. For
      any 
, we show (Theorem 6) that the exponents of 
 can be
      computed in time
    
![]()  | 
        (3) | 
      where 
 and 
 is prime. This
      algorithm is probabilistic of Monte Carlo type. Based on numerical
      evidence in section 3, we conjecture that 
. We also show (Theorem 4) that
      the coefficients may be computed in expected time
    
![]()  | 
        (4) | 
      using a probabilistic algorithm of Las Vegas type. In practice, when
      
 is small and 
 not too
      large with respect to 
, the
      corrective terms in (3) are negligible and the cost reduces
      to 
. Similarly, the cost (4) usually simplifies to 
.
      If we also have 
, then this
      means that the cost of the entire sparse multiplication becomes 
. Here we note that 
 also corresponds to the time needed to multiply two dense
      polynomials in 
, provided
      that the product 
 satisfies 
      and 
.
    
      The proof of these bounds relies on the evaluation of 
      at three points of the form 
 in algebras of the
      form 
, where 
. If 
 is sufficiently
      large (namely 
 for some critical value) and we
      already know the exponents of 
,
      then we show how to recover the coefficients with high probability. One
      interesting feature of our algorithm is that three evaluations are
      sufficient with high probability. A logarithmic number of evaluations is
      necessary when using the more obvious iterative approach for which every
      additional evaluation allows us to compute a constant fraction of the
      unknown coefficients (with high probability). Our algorithm is first
      explained with an example in section 2 and then in general
      in section 4.1. The probabilistic analysis is done in
      section 3. In section 4.2, we extend our
      approach to the computation of the exponents, using three additional
      evaluations. Section 5 is devoted to variants and
      extensions of our approach, and further remarks. The final section 6 contains slower but unconditional variants that do not
      depend on the heuristics HE and HC.
    
The present paper works out an idea that was first mentioned in [24, section 6.6], in the context of general sparse interpolation. The application to polynomial multiplication is particularly suitable because of the low amortized cost of blackbox evaluations. Random monomial transformations (also called randomized Kronecker substitution) were considered before in [3, 30]. The idea of using evaluations in (small) cyclic algebras has been used before in [2, 10, 35]. Our algorithms also exploit ideas from the technique of diversification [13, 31]. However, even though the present work boroughs a lot of techniques from these previous works, our final complexity bounds (3) and (4) improve on the previously known ones (see section 5.1 for variants in other asymptotic regimes). The approach to evaluate in cyclic algebras finally seems close to binning techniques that have recently been applied to compute sparse Fourier transforms [17, 34]; we plan to investigate this parallel in future work.
Notes. We released three versions of the present paper. In April 2022, we added section 6 with variants of our main algorithms that have the major “sales” advantage of being unconditional, but at the same time miss the more subtle charm of reducing the constant factor with respect to dense multiplication to almost one (or two for the exponents).
      The present version integrates various minor corrections and adds a few
      attributions that were absent in the 2020 version. We have also been
      made aware of the fact that the “game of mystery balls” was
      already known in a different form [14]. The critical value
      
 can actually be computed “exactly”:
      
.
    
      Finally, we started an experimantal 
, where the exponents and 
      all fit into machine double precision numbers. A typical multiplication
      of two multivariate sparse polynomials with 
 and
      
 takes about 
 seconds on
      an 
 GHz. The “asymptotically
      dominant” cyclic multiplications in 
 take
      about 
 seconds, which is about 
      times the cost of a cyclic muliplication of length 
, whereas 
 in (5).
      Note that the cyclic multiplications through DFTs benefit from heavier
      optimizations than the subdominant operations.
    
Consider two sparse polynomials
    
      Their product 
 is given by
    
    
      Assume that the monomials 
 of 
      are known, but not the corresponding coefficients. Our aim is to
      determine 
 through its evaluations at
      “points” of the form 
 in 
 for suitable 
 and lengths 
. These evaluations are obtained by
      evaluating 
 and 
 at the
      same points and multiplying the results in 
.
      In what follows, we will use three evaluation points.
    
      Let us show how to turn the problem of computing the coefficients of
      
 into a “game of mystery balls”. At
      the start of the game, we have one numbered ball for each term of 
:
    
    
      For each ball, say 
, the
      corresponding “mystery coefficient” 
      needs to be determined (it might be hidden inside the ball), whereas the
      corresponding exponents 
 are known (and stored in
      a table or painted on the ball). In fact, our game has three identical
      sets of balls, one for each of the three evaluation points. For each of
      these evaluation points, we also have a set of 
      boxes, labeled by 
.
    
      Now consider the evaluation of 
 at a point as
      above, say at 
 in the ring 
. Then each term 
 evaluates
      to a term 
 with 
 and 
 modulo 
.
      In our game, we throw the corresponding ball into the box that is
      labeled by 
. For instance,
      our first ball 
 evaluates to 
      and goes into the box labeled by 
.
      Our second ball 
 evaluates to 
      and goes into the box labeled by 
.
      Continuing this way, we obtain the upper left distribution in Figure 1. Now the complete evaluation of 
 at
      
 in 
 gives
    
    
      For each box, this means that we also know the sum of all coefficients
      hidden in the balls in that box. Indeed, in our example, the first box
      
 contains three balls 
, , and 
,
      with coefficients 
, 
, and 
 that
      sum up to 
. In Figure 1, we indicated these sums below the boxes. In round one of
      our game, we actually took our chances three times, by using the three
      evaluation points 
, 
, and 
 in
      
, and throwing our balls
      accordingly. This corresponds to the top row in Figure 1.
    
      ends up in a box of its own for our first throw. Similarly, the balls
      
      and
      
      both have private boxes for the second throw. Ball
      
      also has a private box for the third throw. Removing the balls
      
,
      
      
,
      and
      
      from the game, we obtain the second row in Figure
      1
      . We also updated the numbers below the boxes: for every box, the number
      below it still coincides with the sum of the mystery coefficients inside
      the balls inside that box. Now that the balls
      
,
      
      
,
      and
      
      have been removed, we observe that balls
      
      and
      
      have private boxes in their turn. We may thus determine their mystery
      coefficients and remove them from the game as well. This brings us to
      round three of our game and the third row in Figure
      1
      . Going on like this, we win our game when all balls eventually get
      removed. We lose whenever there exists a round in which there are still
      some balls left, but all non-empty boxes contain at least two balls. In
      our example, we win after five rounds.
    
    
      Remark 
 balls, then the
      total amount of work that needs to be done in each round remains
      proportional to the number of balls that are removed instead of the
      total number of boxes. Consequently, the complete game can be played in
      linear time, with high probability.
    
      Remark 
 does
      not get removed in our modified game and choose 
      in such a way that the round 
 in which it gets
      removed in the original game is minimal. Then we observe that all balls
      that were removed before round 
 in the original
      game also get removed in the modified version, eventually. When this
      happens, 
 is in a private box for one of the
      throws: a contradiction.
    
      Have we been lucky in our example with 
 throws of
      
 balls in 
 boxes? For the
      probabilistic analysis in this section, we will assume that our throws
      are random and independent. We will do our analysis for three throws,
      because this is best, although a similar analysis could be carried out
      for other numbers of throws. From now on, we will assume that we have
      
 balls and 
 boxes.
    
      The experiment of throwing 
 balls in 
 boxes has widely been studied in the literature about hash
      tables [8, Chapter 9]. For a fixed ball, the probability
      that all other 
 balls end up in another box is
      given by
    
    
      More generally, for any fixed 
,
      the probability that 
 other balls end up in the
      same box and all the others in other boxes is given by
    
    
      Stated otherwise, we may expect with high probability that approximately
      
 balls end up in a private box, approximately
      
 balls inside a box with one other ball, and so
      on.
    
      This shows how we can expect our balls to be distributed in the first
      round of our game and at the limit when 
 gets
      large. Assume more generally that we know the distribution 
 in round 
 and let us show how to
      determine the distribution 
 for the next round.
      More precisely, assume that 
 is the expected
      number of balls in a box with 
 balls in round
      
, where we start with
    
    Setting
    
      we notice that 
, where 
 stands for the expected number of balls that are
      removed during round 
.
    
      Now let us focus on the first throw in round 
      (the two other throws behave similarly, since they follow the same
      probability distribution). There are 
 balls that
      are in a private box for this throw. For each of the remaining balls,
      the probability 
 that it ended up in a private
      box for at least one of the two other throws is
    
    
      The probability that a box with 
 balls becomes
      one with 
 balls in the next round is therefore
      given by
    
    
      For all 
, this yields
    
    
      If 
 tends to zero for large 
      and 
 gets large, then we win our game with high
      probability. If 
 tends to a limit 
, then we will probably lose and end up with
      approximately 
 balls that can't be removed (for
      each of the three throws).
    
      We have not yet been able to fully describe the asymptotic behavior of
      the distribution
      
      for
      
,
      which follows a non-linear dynamics. Nevertheless, it is easy to
      compute reliable approximations for the coefficients
      
      using interval or ball arithmetic [
      19
      ]; for this purpose, it suffices to replace each coefficient
      
      in the tail of the distribution (
      i.e.
      for large
      
)
      by the interval
      
.
      Tables
      1
      ,
      2
      , and
      3
      show some numerical data that we computed in this way (the error in the
      numbers being at most
      
).
      Our numerical experiments indicate that the “phase change”
      between winning and losing occurs at a critical value
      
      with
      
  | 
            |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
    
      Table
      1
      shows what happens for
      
:
      until the seventh round, a bit less than half of the balls get removed
      at every round. After round eight, the remaining balls are removed at an
      accelerated rate. For
      
,
      the distributions
      
      numerically tend to a non-zero limit distribution
      
      with
      
.
      In Table
      3
      , we show some of the distributions in round ten, for
      
      near the critical point
      
.
      We also computed an approximation of the limit distribution at the
      critical point itself.
      
  | 
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
  | 
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
      For 
, we note that reliable
      numerical computations can be turned into an actual proof that we lose
      with high probability. Indeed, assume that 
 gets
      very small for some 
, whereas
      
 remains bounded (for instance, in Table 2,
      we have 
 and 
 for 
). Then 
      also gets very small and
    
    
      If 
 happens to be 
 times
      smaller than 
 for some fixed 
      (for instance, in Table 2, we actually have 
 for 
), then a
      standard contracting ball type argument can be used to prove that 
 decreases to zero with geometric speed for 
, while 
      remains bounded away from zero.
    
      Conversely, given 
, it seems
      harder to prove in a similar way that we win. Nevertheless, for any
      
, reliable computations
      easily allow us to determine some round 
 with
      
. For instance, for 
 and 
, we may
      take 
. This is good enough
      for practical purposes: for 
,
      it means that we indeed win with high probability (and even if we do not
      win, then we may rerun the algorithm for the missing terms: see [1] and section 6 below). Here we also note that a
      large number of rounds are generally necessary to win when 
 approaches the critical value 
. For instance, for 
,
      we need to wait until round 
 to get 
. Nevertheless, after a certain number of
      rounds, it seems that 
 always converges to zero
      with superlinear speed.
    
      In Table
      4
      , we conclude with the result of a computer simulation in which we
      played our game with
      
      and
      
.
      As one can see, the results are close to the theoretically expected
      ones from Table
      1
      .
      
  | 
            |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
      Remark 
      Let us now turn to the general problem of multiplying sparse
      polynomials. We will focus on the multiplication 
      of integer polynomials 
 in at least two
      variables. We define 
 to be the total degree of
      
 and assume that we have a bound 
      for the number of terms of 
.
    
      As explained in the introduction, we proceed in two phases: we first
      determine the exponents of the unknown product 
. This part is probabilistic of Monte Carlo type,
      where we tolerate a rate of failure 
 that is
      fixed in advance. In the second phase, we determine the unknown
      coefficients, using a probabilistic algorithm of Las Vegas type. In this
      section, we start with the second phase, which has already been
      explained on an example in section 2.
    
      Assume that our product 
 has 
      terms, so that 
 for certain 
      and 
, where 
      for 
. It is obvious how to
      generalize the algorithm from section 2 to this case: for
      some fixed 
, we distribute
      “our 
 balls” over 
      boxes, through the evaluation of 
 at three points
      
 in 
 for 
      and 
. The vectors 
 are essentially chosen at random, but it is convenient to
      take them pairwise non-collinear modulo 
,
      so as to avoid any “useless throws”. We assume the following
      heuristic:
    
            For random vectors 
 as above and each of
            the three throws, the balls are distributed over the boxes in a
            uniformly random way, and the three distributions are independent.
          
      We then play our game of mystery balls as usual, which yields the
      coefficients
      
      if we win and only a subset of these coefficients if we lose. In view
      of Remark
      2
      , this leads to Algorithm
      1
      .
      
                Algorithm  
                Output: the product  
                Assume:   | 
            
      Proof. The correctness of the algorithm has been
      explained in section 2. As to the running time, we first
      note that none of the integer coefficients encountered during our
      computation exceeds 
 in absolute value. Now steps
      1 and 2 have negligible cost and the running
      time for the remaining steps is as follows:
    
          In step 3, for every term 
 in
          
 or 
 and every 
, we first have to compute 
 modulo 
.
          Since we only have to accumulate 
 when 
, this can be done in time
          
. We next have to add the
          coefficients of all terms that end up in the same box, which amounts
          to 
 additions of cost 
.
        
          In step 4, we do three multiplications of cost 
 each. Since 
 is
          non-decreasing, the cost of these multiplications is bounded by
          
.
        
          In steps 5 and 6, we play our game of
          mystery balls, where 
 stands for the set of
          ball that are still in play. Whenever ball 
          ends up in a private box for throw number 
, we have to determine where this ball landed
          for the other throws, by computing 
 modulo
          
 for 
.
          We then have to update the numbers below the corresponding boxes,
          which corresponds to setting 
 in step 6b. Since this eventually has to be done for
          each of the 
 balls, step 6 takes
          
 bit-operations, using a similar analysis as
          for step 3.
        
Let us finally investigate bookkeeping costs that are implicit in our description of the algorithm. Above all, we have to maintain the linked lists with balls inside each box (see Remark 1). This can be done in time
    
      The other implicit costs to maintain various tables are also bounded by
      
.
    
      In [24], we surveyed several strategies for computing the
      exponents of the product 
.
      Most of the approaches from sections 4, 6, and 7 of that paper can be
      adapted to the present setting. We will focus on a probabilistic
      strategy that we expect to be one of the most efficient ones for
      practical purposes (a few variants will be discussed in section 5).
    
      For 
, let 
      be the 
-th prime number and
      let 
. We let 
      be a fixed prime number with 
 (for practical
      implementations, we may also take 
 to be a
      product of prime numbers that fit into machine words, and use
      multi-modular arithmetic to compute modulo 
).
      For some fixed 
, we again use
      
 boxes, and evaluate 
 over
      the ring 
. This time, we use
      six evaluation points of the form 
 and 
 for 
, where the
      
 are chosen at random and pairwise non-collinear
      modulo 
.
    
      Now consider a term 
 of 
. Its evaluation at 
 is 
, where 
      modulo 
. Meanwhile, its
      evaluation at 
 is 
 with
      
. If there is no other term
      that evaluates to an expression of the form 
 at
      
, then the same holds for the
      evaluation at 
. Consequently,
      the unique representative 
 of the quotient 
 in 
 coincides with 
 (the quotient is well defined with high probability). From
      this, we can determine the exponents 
 by
      factoring 
. As additional
      safeguards, we also check that 
,
      that all prime factors of 
 are in 
, and that 
 modulo 
.
    
      Conversely, assume that there are at least two terms of 
      that evaluate to an expression of the form 
 at
      
. Let 
      and 
 now be the coefficients of 
      in 
 and 
.
      Then the quotient 
 is essentially a random
      element in 
 (see the heuristic
      HC below for more details), so its unique
      representative in 
 is higher than 
 with probability 
.
      This allows our algorithm to detect that we are dealing with colliding
      terms; for the 
 quotients that we need to
      consider the probability of failure becomes 
.
    
      Using the above technique, the three additional evaluations at 
, allow us to determine which balls
      end up in a private box in our game of mystery balls. Moreover, since we
      can determine the corresponding exponents, we can also find where these
      balls landed for the two other throws. This allows us to play our game
      modulo minor modifications. Besides maintaining the numbers below the
      boxes for the first three throws, we also maintain the corresponding
      numbers for the evaluations at 
.
      For every round of the game, the same technique then allows us to
      determine which balls have private boxes, and iterate.
    
      In our probabilistic analysis, it is important that the quotients
      
      are essentially random elements in
      
      in case of collisions. This assumption might fail when the coefficients
      of
      
      and
      
      are special (
      e.g.
      either zero or one). Nevertheless, it becomes plausible after a change
      of variables
      
,
      
      
,
      where
      
      are random invertible elements in
      
.
      Let us formulate this as our second heuristic:
      
                Algorithm  
                Output: the exponents of the product  
                Assume:   | 
            
            For random vectors 
 and after a random
            change of variables 
 (
), the quotients 
            as above are uniformly distributed over 
, and the distributions for different
            quotients are independent.
          
      Random change of variables 
 are a useful known
      technique in sparse interpolation [31] and the way we use
      it here is related to “diversification” [13].
      Algorithm 2 summarizes our method, while incorporating the
      random change of variables.
    
        
 with 
 distinct
        prime factors, it takes
      
        
        bit operations to determine the existence of a prime factorization
        
 with 
,
        and to compute it in case there is.
      
      Proof. We first determine the indices 
 with 
 using a divide and conquer
      technique. At the top level, we start with the remainder 
 of the division of 
 by 
. We next compute 
 and
      
. If 
, then we go on with the computation of 
 and 
, and
      similarly for 
. We repeat
      this dichotomic process until we have found all prime factors of 
. For each of the 
      prime factors 
 of 
,
      we next compute 
 and 
. The total running time of this algorithm is
      bounded by 
.
    
        
        with 
, 
, and 
.
        Then Algorithm 2 is correct and runs in time
      
        
      Proof. We have already explained why our
      algorithm returns the correct answer with probability 
. The complexity analysis for steps 6
      and 8b is similar as the one for Algorithm 1. The running time for the other steps is as follows:
    
          The reductions of 
 and 
          modulo 
 can be computed in time 
, in step 3.
        
          In step 4, we have to compute 
          modulo 
 for every power 
          occurring in the representation of 
 or 
. Using binary powering, such a
          power can be computed in time 
.
          The total cost of this step is therefore bounded by 
.
        
          Step 5 takes time 
.
        
          In steps 7 and 8c, we have
          already shown that one factorization can be done in time 
. Altogether, the cost of these
          steps is therefore bounded by 
.
        
      
    
      Note. We assumed that 
 is
      known in the algorithm. The computation of 
 can
      be done in expected time 
: we
      keep picking 
 uniformly at random between 
 and 
 until we find a prime.
      This requires an expected number of 
 attemps of
      cost 
 each [36]. For practical
      purposes, one may use a table of known Mersenne primes instead.
    
      In section 4.2, we have described an efficient algorithm
      for the case when the total degree 
 is small.
      This indeed holds for most practical applications, but it attractive to
      also study the case when our polynomials are “truly sparse”
      with potentially large exponents. This is already interesting for
      univariate polynomials, a case that we also did not consider so far.
      Modulo the technique of Kronecker substitution [24, Section
      7.1], it actually suffices to consider the univariate case. For
      simplicity, let us restrict ourselves to this case. When 
 gets moderately large, it can also be interesting to
      consider the incremental approach from [22] and [24,
      Section 7.3].
    
      One first observation in the univariate case is that we can no longer
      take the same number of boxes 
 for our three
      throws. Next best is to take 
 boxes for our 
-th throw, where 
. By construction this ensures that 
, 
,
      and 
 are pairwise coprime. If the exponents of
      
 are large, then it is also reasonable to assume
      that our heuristic HE still holds.
    
      For the efficient determination of the exponents, let 
. As before, we may take 
      to be prime or a product of prime numbers that fit into machine words.
      Following Huang [29], we now evaluate both 
      and 
 at 
 in 
 for 
. Any term
      
 of 
 then gives rise to a
      term 
 of 
 with 
, so we can directly read off 
 from the quotient 
.
      Modulo the above changes, this allows us to proceed as in Algorithm 2. With the notation
    
    one may prove in a similar way as before that the bit complexity of determining the exponents in this way is bounded by
    
      since 
, 
, and 
 for univariate
      polynomials. Returning to the multivariate setting via Kronecker
      substitution, this yields the bound
    
![]()  | 
        (5) | 
      where 
. Once the exponents
      are known, we may use Algorithm 1 to compute the
      coefficients. In terms of 
,
      the complexity becomes
    
![]()  | 
        (6) | 
,
        it is also possible to use coefficients in 
, while using a fixed point number representation.
        Provided that all coefficients have the same order of magnitude, the
        floating point case can be reduced to this case. Indeed, the
        reductions modulo 
 are remarkably stable from a
        numerical point of view, and we may use FFT-multiplication in 
. Algorithm 2 can
        also be adapted, provided that 
.
      
 small in
        Algorithm 2 by using the incremental technique from [22] and [24, Section 7.3]. Over finite fields
        
 of small characteristic, one needs to use
        other techniques from [24].
      
      Instead of taking our three additional evaluation points of the form
      
, we may for instance take
      them of the form 
, where 
 is a primitive root of unity of large smooth order
      (this may force us to work over an extension field; alternatively, one
      may use the aforementioned incremental technique and roots 
 of lower orders). Thanks to the smoothness assumption,
      discrete logarithms of powers of 
 can be computed
      efficiently using Pohlig–Hellman's algorithm. This allows us to
      recover exponents 
 from quotients of the form
      
.
    
, in which case complexities are
        measured in terms of the number of operations in 
. For rings of sufficiently large
        characteristic, one may directly adapt the algorithms from section 4. For rings of small characteristic, it is possible to
        generalize the techniques from the finite field case.
      
      The techniques from this paper can also be used for general purpose
      sparse interpolation. In that case the polynomial 
      is given through an arbitrary blackbox function that can be evaluated at
      points in 
-algebras over some
      ring 
. This problem was
      studied in detail in [24]. In section 6, we investigated in
      particular how to replace expensive evaluations in cyclic 
-algebras of the form 
      by cheaper evaluations at suitable 
-th
      roots of unity in 
 or a small extension of 
. The main new techniques from this
      paper were also anticipated in section 6.6.
    
      The algorithms from this paper become competitive with the geometric
      sequence approach if the blackbox function is particularly cheap to
      evaluate. Typically, the cost 
 of one evaluation
      should be 
 or 
,
      depending on the specific variant of the geometric sequence approach
      that we use.
    
      Technically speaking, it is interesting to note that the problem from
      this paper actually does not fit into this framework. Indeed,
      if 
 and 
 are sparse
      polynomial that are represented in their standard expanded form, then
      the evaluation of 
 at a single point requires
      
 operations in 
,
      and we typically have 
. This
      is due to the fact that the standard blackbox model does not take into
      account possible speed-ups if we evaluate our function at many points in
      geometric progression or at 
 in a cyclic algebra
      
.
    
      Both in theory and for practical applications, it would be better to
      extend the blackbox model by allowing for polynomials 
      of the form
    
    
      where 
 is a blackbox function and 
 are sparse polynomials in their usual expanded
      representation. Within this model, the techniques from this paper should
      become efficient for many other useful operations on sparse polynomials,
      such as the computation of gcds or determinants of matrices whose
      entries are large sparse polynomials.
    
      The heuristic HE plays an important role in our
      complexity analysis. It is an interesting question whether it is
      satisfied for polynomials 
 whose support is
      highly regular and not random at all. A particularly important case is
      when 
 and 
 are dense
      polynomials in 
 variables of total degrees 
, 
 and 
. Such polynomials are often
      considered to be sparse, due to the fact that 
      contains about 
 times less terms than a fully
      dense polynomial of degree 
 in each variable.
    
      If 
 is bivariate or trivariate and of very high
      degree, then it has been shown in [18] that 
 can be computed in approximately the same time as the
      product of two dense univariate polynomials which has the same number of
      terms as 
. An algorithm for
      arbitrary dimensions 
 has been presented in [26], whose cost is approximately 
 times
      larger than the cost of a univariate product of the same size. The
      problem has also been studied in [20, 21] and
      it is often used as a benchmark [15, 37].
    
      What about the techniques from this paper? We first note that the
      exponents of 
 are known, so we can directly focus
      on the computation of the coefficients. In case that we need to do
      several product computations for the same 
 and
      
, we can also spend some time
      on computing a small 
 and vectors 
 for which we know beforehand that we will win our game of
      mystery balls. For our simulations, we found it useful to chose each
      
 among a dozen random vectors in a way that
      minimizes the sum of the squares of the number of balls in the boxes
      (this sum equals 
 for the first throw in Figure
      1).
    
      For various
      
,
      
      
,
      and
      
,
      we played our game several times for different triples of vectors
      
.
      The critical value for
      
      for this specific type of support seems to be close to
      
.
      In Table
      5
      below, we report several cases for which we managed to win for this
      value of
      
.
      This
      proves
      that optimal polynomial multiplication algorithms for supports of this
      type are almost as efficient as dense univariate polynomial products of
      the same size.
      
  | 
            ||||||||||||||||||||||||||||||||||||||||||||||||
      The main results of this paper rely on two heuristics
      HE and HC. The first heuristic is the
      most important one and it is in particular satisfied when 
 and 
 are random sparse polynomials:
      indeed, in this case, the distributions are equivalent to those obtained
      by fixing linearly independent 
,
      
, and 
      and then picking monomials at random. In the case of special supports,
      the experiments in section 5.4 indicate that the complexity
      of our algorithms might be even better than for random supports.
      However, the theoretical question remains what can be proved if we
      reject the heuristics HE and HC.
    
      Without HE and HC, it is easier to
      analyze the probabilistic cost for repeated single throws with a larger
      number of boxes 
. Instead of
      obtaining the full result from three simultaneous throws by playing our
      game of mystery balls, we obtain increasingly good approximations of the
      result as we keep on throwing our balls; see [1, sections
      5, 6, and 7] and [24, section 3.1] for this idea. In this
      section, we present complexity analyses for multiplication algorithms
      that are based on this approach.
    
      As in section
      4.1
      , we start with the case when we already know the exponents of the
      product
      
.
      In that case, we can adapt the number of boxes
      
      at every throw to the number of unknown coefficients, which also
      simplifies the probabilistic analysis. As before, the resulting
      Algorithm
      3
      is Las Vegas.
      
      Proof. Let 
.
      We claim that 
 at the start (and at the end) of
      the main loop. This is clearly the case when entering the loop for the
      first time. It follows that 
.
      The decomposition 
 is not unique, but the
      coefficient 
 is uniquely determined whenever
      
, in which case 
. This ensures that 
 and
      that we still have 
 at the end of the loop. If
      the algorithm terminates, then it follows that it returns the correct
      result.
    
      Let us now show that 
 contains at least 
 elements with probability 
.
      For each pair 
 with 
,
      let 
 be the set of 
 for
      which 
. Since 
 modulo 
, the
      set 
 forms a hyperplane modulo 
, whence 
.
      For each 
, let 
 be the set of pairs 
 with 
 and 
. Then
    
    
      so the average size of 
 is given by
    
    
      Now for any 
, the probability
      that 
 for a random 
 is
      bounded by 
. In particular,
      we have 
 with probability 
. Since we took 
,
      it follows that 
 and thus 
, for such 
.
    
      Let us next examine the cost of one iteration of the main loop. We can
      compute 
 in expected time 
, as in the note after Theorem 6. As in
      the proof of Theorem 4, the computation of 
, 
,
      and 
 requires 
 bit
      operations. The multiplication 
 can be done in
      time 
. In combination with
      the above bound, the expected cost to halve the size of 
      is therefore bounded by 
.
      Using that 
, we conclude that
      the total expected cost is bounded by 
.
    
      Remark 
 of the cyclic polynomial multiplications is just
      a constant time worse. However, the subdominant terms 
      in the complexity of Theorem 4 need to be multiplied by
      
. Whereas 
      remains bounded by 
 for the best known
      multiplication algorithm with 
,
      the term 
 may dominate the cost if 
 and 
 is small in comparison with
      
.
    
      Remark 
 whose exponents
      are known, by adapting the algorithms from [24].
    
      Let us now examine the case when the exponents are not known. Then we in
      particular have to guess the number of terms of the product 
. We do this by introducing a “tentative
      upper bound 
” for the
      number of “missing terms” 
,
      which is updated along with the approximation 
 of
      
. We start with 
 and, as long as 
,
      we will show that 
 doubles with high probability
      at every iteration.
    
      In order to avoid depending on the heuristic HC, we
      also compute an extra polynomial 
 in step 5 of Algorithm 4. This allows us to replace the
      heuristic by a probabilistic consistency check in step 9c. This only multiplies the complexity by a constant factor.
      Note that the same trick can be used in combination with Algorithm 2.
    
      In order to ease the probabilistic analysis, we first consider the case
      when the polynomials
      
      and
      
      have modular coefficients in
      
.
      
                Algorithm  
                Output: the product   | 
            
  | 
            
        
, 
, and assume that 
. Then with probability at least 
, the Algorithm 4 returns the
        correct answer and runs in expected time
      
![]()  | 
            (7) | 
      Proof. We will say that an execution of the
      algorithm is flawless if, throughout the execution, every term
      
 that occurs in 
 also
      occurs in 
. This is the case
      if we only add correct terms to 
 in step 9d. Let us analyze the probability that this is indeed the
      case.
    
      Assume that we arrive at step 9 and that the execution has
      been flawless until that point. For a given 
, let 
 be all terms that
      occur in 
 with 
 modulo
      
. Setting 
      and 
, we note that 
. Since the execution was flawless
      until here, we have 
 and 
. If 
,
      then the Schwarz–Zippel lemma [44, 46]
      implies that the probability that we pass the check in step 9c is bounded by 
.
      The probability that the term 
 in step 9d is correct is therefore at least 
 and
      the probability that this is the case for all 
      terms of 
 is at least 
. We conclude that the probability that the
      execution is flawless is at least 
.
    
      Assume next that a flawless execution terminates and let us examine the
      probability that the returned answer is incorrect. So assume that 
 and let 
 be one of the terms
      occurring in 
. Let 
 be all terms that occur in 
 with
      
 modulo 
.
      Let 
 and note that 
.
      Now 
 by our flawless assumption. By the
      Schwarz–Zippel lemma, it follows that the probability that 
 is at most 
.
      This shows that the probability that the algorithm returns an incorrect
      answer is bounded by 
. Since
      the last random choices of the 
 and 
 are independent from the previous ones that led to the
      flawless execution, the overall probability of success is at least 
.
    
      As to the complexity bound, let us again focus on a flawless execution.
      Note that the flawless assumption only involves the random choices of
      the 
 and 
 in step 4;
      the 
 in step 3 are chosen
      independently; we will freely use this observation in the probabilistic
      analysis below. Let us examine the evolution of 
      and 
 during a flawless execution. Clearly, 
 can only decrease. Let 
 and
      
 be the new values of 
 and
      
 after one iteration of the main loop (which
      starts at step 2 and ends at step 11). If 
 is an upper bound for 
,
      then by similar arguments as in the proof of Theorem 7, we
      have 
 with probability 
      (note that we now took 
 instead of 
).
    
      If 
 and 
,
      then we claim that 
 with probability 
. Assume that 
,
      whence 
. Let 
      be the set of pairs 
 of monomials in 
 with 
 (for any total order on the
      exponents) and whose images in 
 under the map
      
 are the same. As in the proof of Theorem 7, we have 
. Now
      the image of a pair 
 in 
      is of the form 
 with 
. For a given 
,
      the minimum of 
 is reached when the monomials are
      “equidistributed” over the “boxes” in 
. In that case, every 
 with 
 has at least 
      pre-images among the monomials in 
.
      Then it follows that
    
    whereas
    
      Consequently, 
, which can
      only happen with probability 
.
      This completes the proof of our claim. Our claim implies that after an
      average 
 number of iterations, we reach a
      situation in which either 
 is doubled or 
 is halved.
    
      Now assume that 
 at a certain point of the
      execution. Then with probability 
,
      we have 
 after one iteration. With probability
      
, we need an average number
      of 
 additional iterations before 
      is either halved or we again reach a point when 
      (here we use the fact that we always have 
 after
      one iteration). Combining these two cases, we see that after an average
      number of 
 iterations, the number 
 of missing terms is halved and 
 is
      again an upper bound for 
.
      Using a similar analysis as in the proof of Theorem 7, it
      follows that the overall time before we terminate (where we count from
      the first moment when 
 holds) is bounded by (7).
    
      It remains to estimate the cost of the initial phase of reaching a point
      when 
 holds. By our claim, we need an average
      number of 
 iterations in order to double 
. The cost of these iterations is
      bounded by 
, as in the proofs
      of Theorems 6 and 7. Summing this bound for
      
 in a geometric progression until 
, the cost of the initial phase is therefore
      again bounded by (7).
    
      Mutatis mutandis, we note that Remarks 8 and 9 also apply for Theorem 10 and Algorithm 4. It remains to consider the original problem of determining
      the exponents of 
 in the case when 
 and 
 have coefficients in 
. For this it suffices to examine
      the probability that a random modular reduction of 
      has the same exponents as 
.
    
        
 be a polynomial with 
 terms and 
.
        Let 
 with 
 and let 
 be a random prime number between 
        and 
. Then the probability
        that 
 and 
 have the same
        exponents is at least 
.
      
      Proof. The 
 coefficients
      of 
 are divisible by at most 
      distinct primes between 
 and 
. Let 
 be the number of
      primes below 
. By [43],
      we have
    
    
      for 
. Consequently,
    
    
      for 
. The probability that a
      randomly chosen prime between 
 and 
 does not divide any of the coefficients of 
 is therefore at least 
 where 
. Under our assumption that 
, we have 
.
    
Of course, the variants from section 5 also admit unconditional counterparts. In particular, for supersparse polynomials, the bounds (5) and (6) now become
    
A. Arnold, M. Giesbrecht, and D. S. Roche. Sparse interpolation over finite fields via low-order roots of unity. In Proc. ISSAC '14, pages 27–34. New York, NY, USA, 2014. ACM.
A. Arnold, M. Giesbrecht, and D. S. Roche. Faster sparse multivariate polynomial interpolation of straight-line programs. JSC, 75:4–24, 2016.
A. Arnold and D. S. Roche. Multivariate sparse interpolation using randomized Kronecker substitutions. In ISSAC '14: Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, pages 35–42. New York, NY, USA, 2014. ACM Press.
M. Asadi, A. Brandt, R. Moir, and M. M. Maza. Sparse polynomial arithmetic with the BPAS library. In International Workshop on Computer Algebra in Scientific Computing, pages 32–50. Springer, 2018.
M. Ben-Or and P. Tiwari. A deterministic algorithm for sparse multivariate polynomial interpolation. In Proc. ACM STOC '88, pages 301–309. New York, NY, USA, 1988.
J. Canny, E. Kaltofen, and Y. Lakshman. Solving systems of non-linear polynomial equations faster. In Proceedings of the ACM-SIGSAM 1989 International Symposium on Symbolic and Algebraic Computation, pages 121–128. ACM Press, 1989.
C. Fieker, W. Hart, T. Hofmann, and F. Johansson. Nemo/Hecke: computer algebra and number theory packages for the Julia programming language. In Proc. ISSAC 2017, pages 157–164. 2017.
Ph. Flajolet and R. Sedgewick. An introduction to the analysis of algorithms. Addison Wesley, Reading, Massachusetts, 2nd edition, 1996.
T. S. Freeman, G. M. Imirzian, E. Kaltofen, and Y. Lakshman. DAGWOOD: a system for manipulating polynomials given by straight-line programs. ACM Trans. Math. Software, 14:218–240, 1988.
S. Garg and É. Schost. Interpolation of polynomials given by straight-line programs. Theoretical Computer Science, 410(27-29):2659–2662, 2009.
M. Gastineau and J. Laskar. Development of TRIP: fast sparse multivariate polynomial multiplication using burst tries. In International Conference on Computational Science, pages 446–453. Springer, 2006.
J. von zur Gathen and J. Gerhard. Modern Computer Algebra. Cambridge University Press, 2-nd edition, 2002.
M. Giesbrecht and D. S. Roche. Diversification improves interpolation. In ISSAC '11: Proceedings of the 36th International Symposium on Symbolic and Algebraic Computation, pages 123–130. ACM Press, 2011.
M. T. Goodrich and M. Mitzenmacher. Invertible Bloom lookup tables. In 49th Annual Allerton Conference on Communication, Control, and Computing, pages 792–799. 2011.
A. W. Groves and D. S. Roche. Sparse polynomials in FLINT. ACM Commun. Comput. Algebra., 50(3):105–108, 2016.
              D. Harvey and J. van der Hoeven. Integer
              multiplication in time 
.
              Annals of Mathematics, 193(2):563–617, 2021.
            
H. Hassanieh, P. Indyk, D. Katabi, and E. Price. Nearly optimal sparse Fourier transform. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, pages 563–578. 2012.
J. van der Hoeven. The truncated Fourier transform and applications. In Proc. ISSAC 2004, pages 290–296. Univ. of Cantabria, Santander, Spain, July 2004.
J. van der Hoeven. Ball arithmetic. Technical Report, HAL, 2009. https://hal.archives-ouvertes.fr/hal-00432152.
J. van der Hoeven and G. Lecerf. On the complexity of blockwise polynomial multiplication. In Proc. ISSAC '12, pages 211–218. Grenoble, France, July 2012.
J. van der Hoeven and G. Lecerf. On the bit-complexity of sparse polynomial multiplication. JSC, 50:227–254, 2013.
J. van der Hoeven and G. Lecerf. Sparse polynomial interpolation in practice. ACM Commun. Comput. Algebra, 48(3/4):187–191, 2015.
J. van der Hoeven and G. Lecerf. Implementing number theoretic transforms. Technical Report, HAL, 2024. https://hal.science/hal-04841449.
J. van der Hoeven and G. Lecerf. Sparse polynomial interpolation: faster strategies over finite fields. AAECC, 2024. https://doi.org/10.1007/s00200-024-00655-5.
J. van der Hoeven et al. GNU TeXmacs. https://www.texmacs.org, 1998.
J. van der Hoeven and É. Schost. Multi-point evaluation in higher dimensions. AAECC, 24(1):37–52, 2013.
M. A. Huang and A. J. Rao. Interpolation of sparse multivariate polynomials over large finite fields with applications. In SODA '96: Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms, pages 508–517. Philadelphia, PA, USA, 1996. Society for Industrial and Applied Mathematics.
Q. L. Huang and X. S. Gao. Sparse Polynomial Interpolation with Finitely Many Values for the Coefficients. In V. Gerdt, V. Koepf, W. Seiler, and E. Vorozhtsov, editors, Computer Algebra in Scientific Computing. 19th International Workshop, CASC 2017, Beijing, China, September 18-22, 2017, Proceedings., volume 10490 of Lect. Notes Comput. Sci. Springer, Cham, 2017.
Q.-L. Huang. Sparse polynomial interpolation over fields with large or zero characteristic. In Proc. ISSAC '19, pages 219–226. ACM, 2019.
Q.-L. Huang and X.-S. Gao. Revisit sparse polynomial interpolation based on randomized Kronecker substitution. In Computer Algebra in Scientific Computing, CASC '19, pages 215–235. Springer, 2019.
M. Javadi and M. Monagan. Parallel sparse polynomial interpolation over finite fields. In Proceedings of PASCO 2010, pages 160–168. ACM Press, 2010.
S. C. Johnson. Sparse polynomial arithmetic. SIGSAM Bull., 8(3):63–71, 1974.
E. Kaltofen and L. Yagati. Improved sparse multivariate polynomial interpolation algorithms. In ISSAC '88: Proceedings of the International Symposium on Symbolic and Algebraic Computation, pages 467–474. Springer Verlag, 1988.
M. Kapralov. Sparse Fourier transform in any constant dimension with nearly-optimal sample complexity in sublinear time. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 264–277. 2016.
M. Khochtali, D. S. Roche, and X. Tian. Parallel sparse interpolation using small primes. In Proceedings of the 2015 International Workshop on Parallel Symbolic Computation, pages 70–77. 2015.
H. W. Lenstra, Jr. and C. Pomerance. Primality testing with Gaussian periods. https://math.dartmouth.edu/~carlp/PDF/complexity12.pdf, 2005.
M. Monagan and R. Pearce. Parallel sparse polynomial multiplication using heaps. In ISSAC '09: Proceedings of the 2009 International Symposium on Symbolic and Algebraic Computation, pages 263–270. ACM Press, 2009.
M. Monagan and R. Pearce. Sparse polynomial multiplication and division in Maple 14. Available from http://cgi.cecm.sfu.ca/~pborwein/MITACS/highlights/SDMPmaple14.pdf, 2009.
H. Murao and T. Fujise. Modular algorithm for sparse multivariate polynomial interpolation and its parallel implementation. JSC, 21:377–396, 1996.
C. H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.
D. S. Roche. Chunky and equal-spaced polynomial multiplication. JSC, 46(7):791–806, 2011.
D. S. Roche. What can (and can't) we do with sparse polynomials? In Proc. ISSAC '18, pages 25–30. New York, NY, USA, 2018. ACM.
B. Rosser. Explicit bounds for some functions of prime numbers. American Journal of Mathematics, 63(1):211–232, 1941.
J. T. Schwartz. Fast probabilistic algorithms for verification of polynomial identities. JACM, 27(4):701–717, 1980.
T. Yan. The geobucket data structure for polynomials. JSC, 25(3):285–293, 1998.
R. Zippel. Probabilistic algorithms for sparse polynomials. In Proc. EUROSAM '79, pages 216–226. Springer, 1979.