> <\body> <\hide-preamble> |<\doc-note> Grégoire Lecerf and Arnaud Minondo have been supported by the French NODE project. Joris van der Hoeven has been supported by an ERC-2023-ADG grant for the ODELIX project (number 101142171). <\wide-tabular> ||||||| ||LOGO_ERC-FLAG_FP.png>|61.62pt|28.51pt||>>>>> ||<\author-affiliation> Laboratoire d'informatique de l'École polytechnique (LIX, UMR 7161) CNRS, École polytechnique, Institut Polytechnique de Paris 1, rue Honoré d'Estienne d'Orves 91120 Palaiseau, France |>>||<\author-affiliation> Laboratoire d'informatique de l'École polytechnique (LIX, UMR 7161) CNRS, École polytechnique, Institut Polytechnique de Paris 1, rue Honoré d'Estienne d'Orves 91120 Palaiseau, France |>>||<\author-affiliation> Laboratoire d'informatique de l'École polytechnique (LIX, UMR 7161) CNRS, École polytechnique, Institut Polytechnique de Paris 1, rue Honoré d'Estienne d'Orves 91120 Palaiseau, France |<\author-email> minondo@lix.polytechnique.fr >>|>||<\doc-note> > > Let > be an effective ring. In other words, elements of > can be represented on a computer and we have algorithms for all ring operations. Let ,\,x|)>> be a system of coordinates and let \x>*\*x>> for any ,\,e|)>\\>. A in over> is an expression of the form <\equation> f=i\t>c*x>, with \\> and \\> for ,t>. We write \1\i\t,c\0|}>> for the of. Given a vector ,\,f|)>> of sparse polynomials, we may consider the polynomial map <\eqnarray*> >|>|>>|,\,a|)>>|>|=,\,f|)>.>>>> A natural question is how to evaluate this map as efficiently as possible. Since such sparse polynomial maps only involve additions, multiplications, and possibly subtractions, it is also natural to search for an evaluation algorithm in the form of a straightline program(SLP). We recall that an SLP is a sequence of arithmetic instructions, without any branchings or loops; see the definition in for instance, or the one from for modern implementation aspects. The goal of this paper is to efficiently compute a shortSLP for the evaluation of a sparse polynomial map. Whereas the size of an SLP is easily measured by the number of operations in >, there are several natural measures for the size of a sparse polynomial(): its number of terms \t>, its expression size <\eqnarray*> >|>|i\t,1\j\n,|)>\0>1,>>>> or its bit size <\eqnarray*> >|>|i\t,1\j\n,|)>\0>|log|)>|\>+1|)>.>>>> (For simplicity, we always regard coefficients in > as having unit size.) More generally, for a vector ,\,f|)>> of sparse polynomials, we define \t>+\+t>>, \s>+\+s>>>, and \b>+\+b>>. In this case, one might also consider size measures that take into account common monomials > or terms > in distinct polynomials \f>. It is not hard to see that |)>> operations always suffice for the evaluation of (using binary powering), whereas 2*-1|)>> binary sums and products are generally necessary for dense polynomials. This lower bound is reached when ,d|}>\\\,d|}>> by using a multivariate Horner scheme. Finding an SLP of optimal size is a very difficult problem(see recent advances in), due to the fact that small SLPs like +2*x-x|)>>-+x-4*x|)>>> may become very large when expanding them as sparse polynomials; see also Remark below. Our goal in this paper is to design an algorithm with an optimal or provingly good complexity in all cases. We will rather describe an easy to implement algorithm that performs well in practice, as illustrated on both random polynomials and a database with more than600 examples. It turns out that the computed SLP is often of size |)>>, whereas the computation of the SLP always takes at most *t|)>> operations, where > is the largest number of variables that occur in a single term of . We also note that the degrees of most polynomials in our database are modest. In section we start with a review of existing approaches. Already in the case when the input polynomials > are simple powers or monomials, it is an interesting question how to efficiently find a short SLP for their (joint) evaluation. Classical algorithms for these cases are due to Brauer, Straus, Yao, and Pippenger; we refer to for a nice survey. For the general case, the most common implementation strategy is based on multivariate Horner schemes; see also sections and. But more dedicated bilinear schemes4.4.3> or expensive partial factorization schemes have also been studied: see sections and. In section we present our new algorithm. It takes advantages of various ingredients. We first reduce to the case when all terms > occurring in the > are simple products (with and \> for ,n>). We next rely on a fast greedy method for the efficient evaluation of multiple products. We pursue with a simple greedy strategy to factor out some of the variables. The resulting SLP may finally be further simplified using common subexpression elimination and combining some of the additions and multiplications into\Pfused multiply-add\Q (FMA) instructions, which can be used to quickly evaluate the final SLP on modern hardware. In our last section we report on our implementation of the algorithm of section inside JIL and show that it is well suited for sparse polynomials that are typically encountered in computer algebra. We complete our implementation by combining efficient strategies to handle polynomials across the full spectrum from sparse to dense. In this section, we start with a review of known methods for evaluating monomials and polynomials. One important special case that has been studied extensively in the literature concerns the computations of powers > with \>. Brauer gave the first algorithm to compute> using |)>*lg k/lg lg k> multiplications, where |log k+1|\>>. The idea is to compute the >-adic expansion <\equation*> k=\>k*2 of for |lg lg k*|)>|\>>. Here > stands for a suitable function with \1> for all and which tends very slowly to zero. One may for instance take \1/>. Then computing > for ,2-1>> takes time >. After that, setting \j>k*2>, computing >=x>\>|)>>> in terms of>> requires squarings and one multiplication. Since \>, the stated complexity bound follows. Erd®s proved that this bound is essentially optimal. In general, the computation of > using only multiplications can be modeled by an ,\,k>> for . This means that >>, where \\\k>> are positive integers such that =1> and for each 1>, there exist \i\i> with =k>+k>>. In practice, we may compute addition chains of small length for all integers until as follows. We start with the trivial addition chain for . Assuming that we have computed addition chains for all integers below, we determine the smallest \k> such that the addition chain ,\,k>> for > is of smallest length with \,\,k>|}>>. Then sorting ,\,k>,k-k> yields an addition chain for . For each , we actually only need to store the corresponding > in atable in order to efficiently recover the addition chain. The algorithm can also be represented by a labeled tree4.6.3> in which the node labeled by has the node labeled by > as its parent: <\equation*> |>||>>|>>>> For instance, is an addition chain for . In our implementation, we precompute this table (or tree) up to a certain threshold. For above the threshold we resort to a >-adic method as above for a suitable value of (see alsobelow). The >-adic approach from the previous subsection generalizes to the computation of multiple powers instead of a single one. It was first shown by Yao thatpowers =x>,\,f=x>> can be computed in time <\equation*> lg k+m*|)>*lg k/lg lg k, where ,\,k|)>>. Yao's method consists of first computing ,\,x>>> with \max |log k|\>,\,|log k|\>|)>> and then to call the following algorithm times for the computation of >> for ,m>: <\vgroup> <\specified-algorithm> in a multiplicative monoid >, \> such that 2> and ,\,x|log k|\>>>>. >. <|specified-algorithm> <\enumerate> Let |log log k-3*log log log k|\>>, 2> and let i\t>k*B> stand for the adic expansion of , with \,B-1|}>>. For ,B-1> compute \i\t,k=j>x>>. Return j\B>y>. The condition 2> is used to ensure that 1>, but steps2 and 3 can still be applied to lower values of , by taking 1> or 2> instead of the value from step1. If , then the average number of bits of is about k|2>>, so Algorithm performs about k|2>-1>> products. If, then the average number of digits > equal to a given is roughly k|8>>, so the number of products is k|8>-1|)>> in step2, and then in step3. Consequently, choosing is advantageous over using only when <\equation*> 3* k|8>-1|)>+5\ k|2>-1, which means 2>. The number of products performed by Yao's method for > to compute powers with random exponents in the range ,\,2-1|]>> for ,32> is shown in Figure. The abscissa represents, and the ordinate displays the number of multiplications divided by (averaged over 100 runs).|gr-frame|>|gr-geometry|||||||||||||||||||||||||||||||||>>||||||||||||||||||||||||||||||||>>||||||||||||||||||||||||||||||||>>||||||||||||||||||||||||||||||||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>|>|>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>|>|||>||>||>||>|>|>|>>>> Computation of several powers using Yao's method and Algorithm. > The original condition 2> and the weakened condition 2>> show that Yao's method is mostly of theoretical interest. For practical purposes, we therefore designed Algorithm below, based on >\adic expansions. It incorporates opportunistic optimizations whenever possible. As shown in Figure, this algorithm improves upon Yao's method for the exponent sizes that we are interested in. <\specified-algorithm> ,\,k\\> with k\\\k>. ,\,a>\\> is an addition chain with ,\,k|}>\,\,a>|}>>. <|specified-algorithm> <\enumerate> If and \2>, then use the algorithm from section. Let |lg lg k*|)>|)>|\>>. Apply the algorithm recursively to the sorted elements of <\equation*> ,2|}>\ rem 2\1\i\m|}>\. Let ,\,r>> be the result. Apply the algorithm recursively to the sorted elements of quo 2\1\i\m|}>\>>. Let ,\,q>> be the result. Sort the list ,\,r>,2*q,\,2*q>,k,\,k>, remove multiple entries, and return the result. <\example> Taking ,k,k|)>=>, Algorithm returns the addition chain . An optimal addition chain is . <\remark> Given ,\,k>, the problem of finding the absolute shortest addition chain ,\,a>> with ,\,k|}>\,\,a>|}>> is known to be NP-complete. The >-adic approach can also be used for the computation of power products. This was already realized by Straus, who showed that >*\*x>> can be computed using |)>*lg k/lg lg k> multiplications, where ,\,k|)>>, and provided that >. The similarity between the complexities of Straus' and Yao's algorithms is not accidental, since both problems are equivalent due to a suitable application of the transposition principle. For more details and historical references, we refer to Bernstein's survey. Another interesting observation due to Straus is that, for the computation of a single power product >*\*x>>, the naive strategy to compute and multiply >,\,x>> is typically not the fastest. In our implementation, for small exponents, we found it simpler to simultaneously replace all powers > that occur in our vector ,\,f|)>> of sparse polynomials by new variables. For each variable >, this requires us to call Algorithm for the list of all exponents such that > occurs in . Consequently, we do not benefit from the additional improvements that Straus provides for large exponents; however, we describe an alternate practical strategy in section. In fact, Pippenger also proposed an algorithm for evaluating multiple power products =x>*\*x>,\,f=x>*\*x>>. This algorithm is more elaborate to implement and we again refer to Bernstein's survey for an overview. In section, we will present an efficient and easily implementable alternative for a key step in Pippenger's algorithm for the case when \> for all . Let us now return to the problem of computing an SLP for evaluating a single sparse polynomial(). In computer algebra systems, sparse polynomials are often represented recursively as polynomials in a single variable >, whose coefficients are sparse polynomials in the remaining variables ,\,x,x,\,x>. This allows us to evaluate each of the univariate polynomials in this representation using Horner's method (or using avariant of Horner's method if the univariate polynomials are themselves represented in a sparse manner). Another option is to decompose *f+\+x*f> with \> and where some of the terms may be zero and left out. We may then recursively expand ,\,f> in a similar manner. Of course, we may also impose additional constraints on ,\,f>. For instance, if > is our main expansion variable as above, then we may require ,\,f,f,\,f> not to depend on >. Such multivariate Horner schemes have been studied in several works. One major issue concerns the choice of the main expansion variable > and the subsequent choices of the main expansion variables for all recursive coefficients. In lucky cases, typically when the support of is rather dense, multivariate Horner schemes may return an SLP of size |)>>. However, on many examples from our data base, the size of the computed SLP is closer to >. Multivariate Horner schemes nonetheless remain a good\Pnaive way\Q to evaluate sparse polynomials, since they are easy to implement and always outperform brute force evaluation of the expression(). Consider a sparse polynomial c*x>> with support \1\i\t|}>\\>. If has a\Pdense flavor\Q, then it is often possible to subdivide the set of variables ,\,x|}>> into two subsets, say =,\,x|}>> and =,\,x|}>>, in such a way that the projected supports <\eqnarray*> >|>|,\,\|)>\\\S|}>>>|>|>|,\,\|)>\\\S|}>>>>> are significantly smaller than . For instance, if is a generic polynomial in variables of total degree , so that ,\,\|)>\\\\+\+\\d|}>>, then => and |\|>=|\|>=>. If is fixed and \>, then this yields \d/n!> and <\equation*> |\|>=|\|>\d/k!\. In this favorable situation, we may first compute the power products >*\*x>> for all \S>, as well as the power products >*\*x>> for all \S>, and then deduce all power products >> for ,t> using only additional multiplications. Assuming that |\|>+|\|>=o>, the entire evaluation of can then be done using |)>*t> multiplications and FMA instructions; Here an FMA instruction is of the form a*b\c>. One may view this method as re-interpreting as a bilinear polynomial in the variables >*\*x>> and >*\*x>>. <\remark> Although this scheme is not expected to be efficient for general purposes, it becomes very attractive whenever the ring operations in > can be performed in asuitable FFT model or using modular arithmetic. In that case, the transforms (FFT transforms or modular reductions) of the coefficients > can be precomputed. For the evaluation of , we then compute the power products >*\*x>> and >*\*x>> as above, along with their transforms. We next do the multiplications >*\*x>|)>\>*\*x>|)>> in the transformed model, as well as all the FMAs. We finally transform the result back to the final value in >. The bulk of the computation then reduces to multiplications and FMAs in the transformed model. This technique was proposed in4.4.3>. <\example> Assume that =\/|)>> is a ring of power series over a field > and assume that the > are actually in >. Assuming that > contains a primitive root of unity> of smooth order \2*r>, then we may use FFTs of length > to transform elements of > into vectors in >>. If the > are in >, then we rather need an > of smooth order \3*r> instead. If we subdivide our set of variables into \2> instead of two sets, then similar ideas still apply, but we need an > of smooth order \+1|)>*r>. Similar remarks apply for modular arithmetic. The smallest SLP that evaluates a given sparse polynomial may be much smaller than the input polynomial. One example is the polynomial <\equation> f=+2*x-x|)>>-+x-4*x|)>> from the introduction, where +2*x-x|)>>> and +x-4*x|)>>> can be computed using repeated squarings. Unfortunately, finding such SLPs from their expanded representations is very hard in general. Nonetheless, it is often possible to find at least some partial factorizations that may help to speed up the evaluation. The problem of factoring sparse polynomials in the traditional mathematical sense has been studied extensively. We refer to for some recent algorithms and historical references. However, the kind of \Pfactorizations\Q as in() that we are really after are not exclusively multiplicative, but may involve both additions and multiplications. In fact, the multivariate Horner schemes that we discussed in section can be regarded as easy-to-compute additive-multiplicative factorizations, by recursively factoring out single variables. We will also use this kind of factorizations in section below. In, a reasonably efficient algorithm has been presented for finding \Psyntactic factorizations\Q, in which none of the terms of the expanded products and sums overlap. For instance, *+a*> is a syntactical factorization of +a*b+a*c+a*d+b*c+b*d>, but *+*> is not a syntactical factorization of , because of the two \Poverlapping\Q terms and . The algorithm from works well for applications in which syntactic factorizations are likely to occur, such as input polynomials that are themselves the expanded result of some algebraic computation (in a generic resultant was used as the running example). In this paper, we focus on SLP transforms which can be computed in quasi-linear time or almost, so we regard the computation of more clever additive-multiplicative factorizations as amore or less independent problem to which we plan to return in future work. Let ,\,x|)>> be the input variables. Consider a vector ,\,f|)>> of sparse polynomials with <\equation*> f=j\t>c*x>,i=1,\,m. In this section, we present our main algorithm to compute an SLP for the efficient evaluation of \\\> as a polynomial map. In brief, our algorithm proceeds as follows: <\enumerate> All powers of individual variables are evaluated and replaced with new variables; coefficients are also replaced by new variables. Consequently, each term of the polynomial is expressed as a product of distinct variables. Horner\type factorizations are applied to the polynomials obtained in step1. We collect all remaining products of distinct variables that need to be evaluated and use a greedy algorithm to evaluate them simultaneously. The final straight\line program is then simplified and optimized for the target . The first step of our algorithm is a reduction to the case where =1> and \> for all . We do this by first introducing new variables ,x,\,x> for all constants \1>. Whenever a constant occurs multiple times, we understand that we introduce a single variable to represent all these identical constants. For each variable >, we next collect the set =|)>\1\i\n,1\j\t|}>> of all non-zero exponents > such that >> occurs in one of the terms of one of the >. All sets > can be determined jointly using a hash table and one linear pass. Whenever \\>, we next use Algorithm to compute an addition chain for the sorted list of elements of>. This addition chain gives rise to an SLP of the same length minus one for the computation of>> for all \>. For each >> with \1>, we introduce a new variable ,x,\>, and use the latter SLPs to compute these new variables as a function of ,\,x>. <\example> As our running example, consider and <\equation> ||>|>|*x*x+x*x*x*x+2*x*x*x>>|>|>|*x*x+x*x*x*x+2*x*x*x*x+3*x*x*x*x.>>>>> We first introduce new variables \2> and \3> for the constants. We next need to introduce new variables for all non-trivial powers. For the variable >, we need to compute > and >, which we do using Algorithm. This yields \x\x>, \x\x>, \x\x>>, so \x> and \x>. We next need to compute >, >, >, and >, which is done as follows: \x\x>, =x\x>, \x\x>, \x\x>, \x\x>, \x\x>. We finally need to compute > and >, which is done using \x\x>, \x\x>. After the introduction of these new variables, our sparse polynomials are rewritten as <\equation> ||>|>|*x*x*x+x*x*x*x+x*x*x*x>>|>|>|*x*x*x+x*x*x*x+x*x*x*x*x+x*x*x*x*x.>>>>> We have just shown how to reduce our evaluation problem to the case where all input polynomials <\equation*> f=i\t>c*x>*\*x> are sums of products of distinct variables. Instead of introducting a distinct variable for each individual power >, it is also possible to further factor such powers as products of powers of the form >>. For this, we compute the -adic expansions of the exponents <\equation*> e=0>e*2. Setting \x>>, we then have <\equation> f=i\t>c*0>y>. As in the previous subsection, we introduce new variables for the coefficients \1>, whereas the evaluation of the > is done directly using binary exponentiation instead of Algorithm. After applying one of the two rewriting algorithms described in the previous subsections, the input polynomials ,\,f>> become sums of products of variables. Although finding the best possible partial factorizations is a hard problem in general, we still can go for \Plow hanging fruit\Q, by recursively factoring out single variables > from the polynomials > as long as possible. More precisely, for each polynomial >, let > be a variable such that the number of terms > that depend on > is maximal. If \1> and \\*t> for some small number \0> (say \1/10>), then we decompose =f*x+f> and replace > by the two polynomials> and > in the list ,\,f> (or just by > if =0>). We keep doing this until no such > exists. Due to the threshold \0>, this algorithm runs in quasi-linear time. <\example> Applying this strategy to the polynomials() leads to the factorizations <\equation*> ||>|>|*x*x+x*x*x|)>*x+x*x*x*x>>|>|>|*x+x*x*x|)>*x+x*x*x|)>*x+x*x*x*x*x.>>>>> The algorithm of the next subsection will compute the values of *x*x>, *x*x>, *x*x*x>, *x>, *x*x>, *x*x>, *x*x*x*x> into the variables >, >, >, >, >, >, >; see Example. In order to compute > and > it remains to perform the following instructions: <\eqnarray*> >|>|+x>>|>|>|,x,x|)>>>|>|>|+x>>|>|>|,x,x|)>>>|>|>|,x,x|)>.>>>> After this, we have =x> and =x>. Here \a*b+c>. In this subsection, we turn to the core subalgorithm. As input, we have a list ,\,>> of pairwise distinct subsets of ,\,x|}>>. As output, we produce anSLP that takes ,\,x>> as input and that computes >v> for ,\>. Without loss of generality we may assume that k\\>|\|>|)>>. For the construction of our SLP we use a simple greedy approach: as long as the list ,\,>> contains two entries > with |\|>\2>, we determine variables>,> for which k\\\,x|}>\|}>|\|>>> is maximal. We then introduce a new variable \x\x>> and replace ,x|}>> by |}>> in the subsets ,\,>> and increase by one. As soon as |\|>=1> for all , the products >v> all become trivial. The algorithm runs as follows. <\specified-algorithm> pairwise distinct subsets ,\,>> of ,\,x|}>>. an SLP for the computation of \ >v> for ,\>. <|specified-algorithm> <\enumerate> Start a new SLP . Create a binary search tree that associates to each pair ,x|)>> with j> and ,x|}>\> for some , the set ,x|]>> of all with ,x|}>\>. Create a heap with triples ,x,x|)>>, where =,x|]>|\|>>, ordered via >. As long as the heap is non-empty, do the following: <\enumerate> Extract and remove the highest triple ,x,x|)>> from . If \,x|]>|\|>>, then continue the loop (with the next highest triple from ). Create a new variable > and append the instruction \x\x> to . For all T,x|]>>, update \\|}>\,x|}>>, update the entries >,x>|]>>> with >,x>\> and such that at least one among >> and >> is in ,x,x|}>>, and insert all new triples ,j>,x>,x>|)>> into . Update n+1>. For ,l>, append the naive evaluation of >v> to , and use this product as the -th output value of . Return . <\proposition> Algorithm is correct and runs in time k\\>|\|>|)>>. The SLP in return performs at most k\\>|\|>-1|)>> products. <\proof> Let k\\>|\|>>. Steps2 and3 perform k\\>|\|>|)>*log S|)>> operations on integers of bit size =O>. If \,x|]>|\|>> holds in step4b, then the triple is no longer valid. Let denote the value of at the end of the algorithm. The bit cost of each update caused by > in step4d is |\|>*log S*log m|)>>. Moreover, each > can be updated at most |\|>> times. Consequently, >. Hence, the total cost is k\\>|\|>|)>>. <\remark> In practice a hash table is used for in Algorithm. Assuming perfect hashing the asymptotic complexity is the same. <\example> Continuing our Example, let us take <\equation> ||||>|>|,x,x|}>>|>|>|>|,x|}>>>|>|>|,x,x|}>>||>|>|,x,x|}>>>|>|>|,x,x,x|}>>||>|>|,x,x|}>>>|||||>|>|,x,x,x,x|}>.>>>>> The two variables > and > appear jointly in two of the >. The algorithm starts by picking ,x|)>> in step4a, so we append \x\x> to the SLP, and also update \,x|}>>>, \,x|}>>, and the table . We next pick ,x|)>> in step4a, append \x\x> to the SLP, and also update \,x,x|}>>, \,x,x,x|}>>, and the table . We next pick ,x|)>> from the heap, but since ,x|]>> has been updated to ,x|]>=1>, we ignore this pick in step4b and directly continue with the next triple on the heap. From this point on, we only retrieve triples ,x,x|)>> with =1> and further append the following instructions to the SLP (while indicating their values in terms of ,\,x> in grey): <\equation*> ||||||||>|>|\x>||*x>|>|>|>|\x>||*x*x>>|>|>|\x>||*x>||>|>|\x>||*x*x*x>>|>|>|\x>||*x*x>||>|>|\x>||*x*x*x>>|>|>|\x>||*x*x>||>|>|\x>||*x*x>>|>|>|\x>||*x*x>||>|>|\x>||*x*x*x*x>>|>|>|\x>||*x*x>||>|>|\x>||*x>>>>> The output variables are >, >, >, >, >, >, >. As our final step, we try to combine as many additions with multiplications as possible into FMA instructions, and run common subexpression elimination on the result. Optionally, we may reschedule some of the instructions and try to reduce the number of variables that are being used. <\example> For our running Example, it is possible to group two pairs of additions and multiplications into FMA instructions. We may also rewrite multiplications \x\x> as squarings \x> (which is more efficient for certain rings >). On the other hand, common subexpression elimination leads to no further simplifications. The final SLP is: <\equation*> ||||||||||||,\,x|)>>||||||||||>|>|>||>|>|>|>|>|>|>|\x>>|>|>|||>|>|>||>|>|\x>>|>|>|>||>|>|\x>||>|>|\x>>|>|>|\x>||>|>|\x>||>|>|\x>>|>|>|>||>|>|\x>||>|>|\x>>|>|>|>||>|>|\x>||>|>|,x,x|)>>>|>|>|>||>|>|\x>||>|>|,x,x|)>>>|>|>|\x>||>|>|\x>||>|>|,x,x|)>>>|>|>|\x>||>|>|\x>||>|>|,x,x|)>>>|>|>|\x>||>|>|\x>||>|>|,x,x|)>>>|||||||||,x|)>>||>>>> We implemented our new algorithm within the library, version >. is a free software library for HPC computations with SLPs. It provides a convenient interface for automatic differentiation, common sub-expression elimination, lifting, transposition, and many other features. It also includes a (Just In Time) compiler dedicated to SLPs. This compiler can generate executable machine code directly in memory, which is useful when the same SLP needs to be evaluated 1> multiple times efficiently. Typical applications include numerical integration and polynomial system solving. The JIT compiler of is one or two orders of magnitude faster than general\purpose compilers such asGCC or CLANG. Consequently, JIT compilation becomes advantageous even for relatively small values of like 1000>. In addition, if several competing SLPs are available for the same task, then we can generate machine code for each of them and empirically select thebestSLP. The source code of our evaluation can be found in the file of . Examples are in and benchmarks in . Currently, our implementation supports 32-bit integer exponents represented by sparse vectors. The SLPs produced by our software include the following operations: negation, binary addition, subtraction, and possibly negated multiplication a*b>, as well as ternary FMA instructions a*b\c>. Our goal is to minimize the length ( the number of operations) of the generated SLP. Below we analyze the obtained results for the following strategies. <\description> This strategy corresponds to our new polynomial evaluation algorithm from section, while using the algorithm from section for the reduction to the case of sums of powers of distinct variables. This is a variant of the \Psparse\Q strategy in which we use the algorithm from section instead of the one from section. Here, the input polynomial is considered in ,\,x,x,\,x|]>|]>>, where maximizes the number of distinct exponents \\\r> of > among the non-zero terms of : <\equation> f=j\s>g,\,x,x,\,x|)>*x>. We apply this Horner strategy recursively to evaluate all the ,\,x,x,\,x|)>>. Then the univariate Horner scheme is applied to obtain the value of using <\equation*> f=x>*,\,x,x,\,x|)>+x-r>*,\,x,x,\,x|)>+\|)>|)>. The powers >,x-r>,\,x-r>> are computed using Algorithm. We have implemented an improved variant of the strategy presented in3>: selecting the variable > as in the above \PHorner\Q strategy, we write <\equation*> f=g,\,x,x,\,x|)>+g,\,x|)>*x, where > contains all the terms of that are not multiple of > and > is not by>. The polynomials > and > are evaluated recursively. We collect all powers> from all recursive evaluations upfront and evaluate them using >. The strategies from section were designed specifically for polynomials with few terms of small degree. In particular, we considered the case of random polynomials with terms in > variables and exponents in >. Figure displays the length of the generated SLP divided by as a function of . The cost is averaged over 10 random samples. The costs of the \Psparse\Q and \Pexponent expansion\Q strategies are actual identical in this special case. Those of \PHorner\Q and \Pgreedy Horner\Q are almost the same.|gr-frame|>|gr-geometry|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||>>||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||>>||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||>>||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>|>|>||>>||>>||>>||>>||>>||>>|>|>|, |>|, |>||>| of terms|>>>> Ratio as a function of , where is the length of the SLP generated using different strategies, for random polynomials with terms in variables and exponents in >. > Figure displays similar cost measures with and exponents 10> constructed randomly as follows: we first select a bit size in > at random and then pick a random number uniformly in ,min,10|)>|]>>. We observe that the \Psparse\Q strategy is ineffective. With fewer than about 50 terms, the \Pexponent expansion\Q strategy performs best; thereafter, the \PHorner\Q strategy (which coincides with the \Pgreedy Horner\Q strategy) becomes more efficient. With a few variables (e.g., 2, 3, 4), the comparisons are similar, with larger thresholds ( becomes approximately for ).|gr-frame|>|gr-geometry|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||>>||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||>>||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||>>||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>|>|>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>|>|>||>||>|, |>||>| of terms|>>>> Ratio as a function of , where is the length of the SLP generated using different strategies, for random univariate polynomials of degree 10> with terms. > Figure displays the time needed to build the SLPs using the four strategies: we construct polynomials in variables with at most 100 terms and random exponents in>>. Coefficients are random 64 bit integers. The cost is averaged over 10 random samples. We observe that the Horner strategies are considerably less efficient than the sparse strategies when the number of variables is large.|gr-frame|>|gr-geometry|||||||||||||||||||||||||||||||||||>>||||||||||||||||||||||||||||||||||>>||||||||||||||||||||||||||||||||||>>||||||||||||||||||||||||||||||||||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>|>|>||>>||>>||>>||>>|10|>|>|>||>||>||>||>|>>>> Times for builing SLPs of random polynomials with terms and coefficients in>. > The \PHorner\Q and \Pgreedy Horner\Q strategies coincide for univariate polynomials and are usually the most efficient. The \Pexponent expansion\Q method is only slightly slower, while the \Psparse\Q strategy is suboptimal in this situation. Figure shows averaged ratios as a function of (for random samples), this time for random bivariate dense polynomials of partial degrees 31>. The behavior is roughly similar to what we observed in the univariate case. For trivariate polynomials of partial degrees 9>, the four strategies have closer efficiencies. <\big-figure||gr-frame|>|gr-geometry|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||>>||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||>>||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||>>||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>|>|>||>>||>>||>>||>>|>|>||>||>||>||>||>| of terms|>>>> Ratio as a function of , where is the length of the SLP generated using different strategies, for random bivariate polynomials of partial degrees 31>. Considering the relative performance of the four strategies described above, and noting that the construction of an SLP via the Horner methods tends to become more expensive for sparse polynomials, we retained a \Pcombined\Q default strategy for the evaluation of sparse polynomials in JIL. More precisely, our implementation selects the shortest SLP among the ones produced via the following strategies: <\enumerate> The first strategy is \Pexponent expansion\Q. The second strategy performs a single step of the \PHorner\Q strategy (i.e. with respect to a single variable), by the \Pexponent expansion\Q strategy (where we evaluate all the > of simultaneously). The third strategy performs two steps of the \PHorner\Q strategy, followed by the \Pexponent expansion\Q strategy: after the first step of the \PHorner\Q strategy and we evaluate all the > of simultaneously using the second strategy. All the > are considered as univariate polynomials in the same variable >, where maximizes the number of distinct exponents of > among the union of the non-zero terms of ,\,g>>. Of course, we only apply this third strategy when the second strategy produces ashorter SLP than the first one. This \Pcombined\Q strategy is almost always best for sparse multivariate polynomials. For denser polynomial, it still performs well, since the density is typically captured by one and sometimes two of the variables. The most expensive step of computing the SLP using the \Pcombined\Q strategy is Algorithm. The other steps run in quasi\linear time. To benchmark the efficiency of our \Pcombined\Q strategy, we collected polynomial systems with integer coefficients primarily from the project and the example suite. These systems are available at . We measured the lengths >>, >>, and >> of the SLPs produced by the \Pcombined\Q, \PHorner\Q, and \Pgreedy Horner\Q strategies. In Figure, each polynomial system is represented by across: the abscissa shows the logarithm of the bit size of the system, and the ordinate shows >,L>|)>/L>>.|gr-frame|>|gr-geometry||>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>|>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>||>>|>|>||>>||>>||>>||>>|>|>|>,L>|)>|L>>>|>| of bit sizes>|>>>> Relative performance of the \Pcombined\Q strategy across the database. > After a closer examination of the timings, we observed that relatively dense homogeneous bivariate polynomials are better suited for the \Pexponent expansion\Q method than for Horner's method. On the other hand, the points with low ordinates in Figure often correspond to systems with a very specific structure, for example those labeled in the database. These rare cases are where the \Pgreedy Horner\Q method outperforms \Pcombined\Q. Of course, in our software implementation, the user can select the strategy and even invoke an intensive search for optimal combinations. We are also investigating variants of our algorithms that use more aggressive factorizations in the same vein as. We plan to detail this work in a future paper. <\bibliography|bib|tm-plain|> <\bib-list|24> A.Ahlbäck, J.vander HoevenG.Lecerf. JIL: a high performance library for straight-line programs. , 2025. D.J.Bernstein. Pippenger's exponentiation algorithm. Available at , 2002. J.Berthomieu, Ch.EderM.Safey El Din. Msolve: a library for solving polynomial systems. , ISSAC'21, 51\U58. ACM. A.Brauer. On addition chains. , 45(10):736\U739, 1939. P.Bürgisser, M.ClausenM.A.Shokrollahi. , 315. Springer-Verlag, 1997. J.CarnicerM.Gasca. Evaluation of multivariate polynomials and their derivatives. , 54(231\U243), 1990. M.CeberioV.Kreinovich. Greedy algorithms for optimizing multivariate Horner schemes. , 38(1):8\U15, 2004. D.E.Knuth . . Addison-Wesley, 3>, 1997. A.DeminJ.vander Hoeven. Factoring sparse polynomials fast. Complexity>, 88:101934, 2025. P.Downey, B.LeongR.Sethi. Computing sequences with addition chains. Comput.>, 10(3):638\U646, 1981. P.Erd®s. Remarks on number theory III. On addition chains. , 6(1):77\U81, 1960. H.-G.Gräbe. The SymbolicData project. 2020. . J.vander Hoeven. Calcul analytique. , 1, 1\U85. CIRM, 2011. . J.vander Hoeven. . Scypress, 2020. J.vander HoevenG.Lecerf. Towards a library for straight-line programs. , 37:331\U387, 2026. J.vander Hoeven etal. GNU TeXmacs. , 1998. R.Ilango. Constant depth formula and partial function versions of MCSP are hard. Comput.>, 53(6):FOCS20\U317, 2022. C.E.Leiserson, L.Li, M.Moreno MazaY.Xie. Efficient evaluation of large polynomials. K.Fukuda, J.vander Hoeven, M.JoswigN.Takayama, , 6327, 342\U353. Springer, Berlin, Heidelberg, 2010. V.Ya Pan. Methods of computing values of polynomials. , 21:105\U136, 1966. N.Pippenger. On the evaluation of powers and related problems. , 258\U263. IEEE Computer Society, 1976. N.Pippenger. The minimum number of edges in graphs with prescribed paths. , 12(1):325\U330, 1978. N.Pippenger. On the evaluation of powers and monomials. Comput.>, 9(2):230\U250, 1980. E.G.Straus. Addition chains of vectors (problem 5125). , 71(7):806\U808, 1964. A.C.-C.Yao. On the evaluation of powers. Comput.>, 5(1):100\U103, 1976. <\initial> <\collection> <\attachments> <\collection> <\associate|bib-bibliography> <\db-entry|+1PlYSeg92FURz2kB|book|TeXmacs:vdH:book> <|db-entry> > <\db-entry|+1PlYSeg92FURz2kC|misc|TeXmacs:website> <|db-entry> > > <\db-entry|+2MBfjYgs1WYemMtm|book|BuClSh1997> <|db-entry> M. M. A. > <\db-entry|+2IQzqOKA2nc|article|vdH:slp> <|db-entry> G. > <\db-entry|+1tFen0QUmo0|article|Pan1966> <|db-entry> > <\db-entry|+1InjwFHiJF9|article|Ilango2022> <|db-entry> > Comput.> <\db-entry|+1i3gcST5rYK|article|Br39> <|db-entry> > <\db-entry|+1InjwFHiJF6|article|Str64> <|db-entry> > <\db-entry|+1InjwFHiJF7|article|Yao76> <|db-entry> > Comput.> <\db-entry|+1i3gcST5rYQ|inproceedings|Pipp76> <|db-entry> > <\db-entry|+1InjwFHiJF3|article|Pipp78> <|db-entry> > <\db-entry|+1InjwFHiJF5|article|Pipp80> <|db-entry> > Comput.> <\db-entry|+1InjwFHiJFC|misc|Bern02> <|db-entry> > > <\db-entry|+1i3gcST5rYL|article|CG90> <|db-entry> M. > <\db-entry|+1i3gcST5rYM|article|CK04> <|db-entry> V. > <\db-entry|+3B0hZuXw5Og|inproceedings|vdH:jncf> <|db-entry> > > <\db-entry|+1InjwFHiJF2|inproceedings|LLMX10> <|db-entry> E. L. M. Moreno Y. > J. van der M. N. > <\db-entry|+1InjwFHiJF1|article|Erd60> <|db-entry> > <\db-entry|+1InjwFHiJFE|book|Knuth1997> <|db-entry> >> <\db-entry|+1InjwFHiJFG|article|DLS81> <|db-entry> B. R. > Comput.> <\db-entry|+1InjwFHiJFF|article|vdH:sparsefact> <|db-entry> J. van der > Complexity> <\db-entry|+UCHwtGN1sHSR59v|misc|JIL> <|db-entry> J. van der G. > > <\db-entry|+1AaCGEWSvQZ|misc|symbolic_data_project> <|db-entry> > > <\db-entry|+1AaCGEWSvQg|inproceedings|berthomieu_msolve_2021> <|db-entry> Ch. M. > <\references> <\collection> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > <\auxiliary> <\collection> <\associate|bib> TeXmacs:vdH:book TeXmacs:website BuClSh1997 vdH:slp Pan1966 Ilango2022 Br39 Str64 Yao76 Pipp76 Pipp78 Pipp80 Bern02 CG90 CK04 vdH:jncf LLMX10 JIL vdH:slp Br39 Erd60 Knuth1997 Yao76 DLS81 Str64 Bern02 Pipp76 Pipp78 Pipp80 Bern02 CG90 CK04 vdH:jncf vdH:sparsefact LLMX10 LLMX10 LLMX10 JIL vdH:slp CK04 symbolic_data_project berthomieu_msolve_2021 LLMX10 <\associate|figure> |1>|> Computation of several powers using Yao's method and Algorithm |->|-0.3em|>|0em||0em|>>. |> |2>|> Ratio |L/t> as a function of |t>, where |L> is the length of the SLP generated using different strategies, for random polynomials with |t> terms in |100> variables and exponents in |>. |> |3>|> Ratio |L/t> as a function of |t>, where |L> is the length of the SLP generated using different strategies, for random univariate polynomials of degree |\10> with |t> terms. |> |4>|> Times for builing SLPs of random polynomials with |100> terms and coefficients in |->|-0.3em|>|0em||0em|>>|>. |> |5>|> Ratio |L/t> as a function of |t>, where |L> is the length of the SLP generated using different strategies, for random bivariate polynomials of partial degrees |\31>. |> |6>|> Relative performance of the \Pcombined\Q strategy across the database. |> <\associate|toc> |math-font-series||font-shape||1.Introduction> |.>>>>|> |math-font-series||font-shape||2.Survey of known strategies> |.>>>>|> |2.1.Power trees |.>>>>|> > |2.2.Multiple powers |.>>>>|> > |2.3.Multiple power products |.>>>>|> > |2.4.Multivariate Horner schemes |.>>>>|> > |2.5.Bilinear schemes |.>>>>|> > |2.6.Partial factorizations |.>>>>|> > |math-font-series||font-shape||3.The new algorithm> |.>>>>|> |3.1.Replacing coefficients and powers by new variables |.>>>>|> > |3.2.Binary expansion of exponents |.>>>>|> > |3.3.Factoring out variables in a greedy manner |.>>>>|> > |3.4.Greedy evaluation of a list of products |.>>>>|> > |3.5.Simplification and polishing |.>>>>|> > |math-font-series||font-shape||4.Implementation> |.>>>>|> |4.1.Implemented strategies |.>>>>|> > |4.2.Sparse random polynomials |.>>>>|> > |4.3.Dense polynomials |.>>>>|> > |4.4.Our final \Pcombined\Q strategy |.>>>>|> > |math-font-series||font-shape||Bibliography> |.>>>>|>