> <\body> <\hide-preamble> equations given by straight-line programs>|<\doc-note> Grégoire Lecerf and Arnaud Minondo have been supported by the French NODE project. Joris van der Hoeven has been supported by an ERC-2023-ADG grant for the ODELIX project (number 101142171). <\wide-tabular> ||||||| ||LOGO_ERC-FLAG_FP.png>|61.62pt|28.51pt||>>>>> ||<\author-affiliation> Laboratoire d'informatique de l'École polytechnique (LIX, UMR 7161) CNRS, École polytechnique, Institut Polytechnique de Paris 1, rue Honoré d'Estienne d'Orves 91120 Palaiseau, France |>>||<\author-affiliation> Laboratoire d'informatique de l'École polytechnique (LIX, UMR 7161) CNRS, École polytechnique, Institut Polytechnique de Paris 1, rue Honoré d'Estienne d'Orves 91120 Palaiseau, France |>>||<\author-affiliation> Laboratoire d'informatique de l'École polytechnique (LIX, UMR 7161) CNRS, École polytechnique, Institut Polytechnique de Paris 1, rue Honoré d'Estienne d'Orves 91120 Palaiseau, France |<\author-email> minondo@lix.polytechnique.fr >>|>||<\doc-note> > line program which only uses ring operations. Under mild assumptions, our method essentially runs in linear time.>> In this paper, > will represent either > or >. Let > be an open subset of>, let > be a map \\>, we will consider the Ordinary Differential Equation (ODE) <\equation> ,\,x|)>=\,\,x|)> where ,\,x|)>:I\\> stands for an unknown function on an open interval \>. A> of > is a polynomial map =,\,\|)>\\,\,X|]>> with n> such that any solution ,\,x|)>> of on extends into a solution ,\,x|)>> of <\equation> ,\,x|)>=\,\,x|)> which is also defined on . We say that a solution ,\,x|)>:I\\> of() satisfies the initial condition <\equation*> ,\|)>\\\\ if \I> and |)>,\,x|)>|)>=\>. Finding such solutions is called an . This paper is devoted to the fast computation of polynomializations of >, along with the corresponding initial conditions. ODEs arising in many areas of physics and chemistry are often non-linear and include divisions, exponentiations, logarithms, square roots, etc. Although methods for ODE polynomialization are well known, we are not aware of complexity bounds for the case where> is given by a Straight-Line Program (SLP for short, defined in section). Although numerical integrators typically allow > to be an arbitrary blackbox function, it is most common that the implementation of > is actually an SLP. Our motivation is three-fold. From a theoretical point of view, we are interested in asharp worst case complexity bound. Polynomialization is of practical interest, because the quotients and special functions occurring in> are typically more expensive to evaluate than the simpler ring operations occurring in>. For example, when using a recent Intel> CPU with > architecture, an AVX512 division in double precision has a latency of23 CPU cycles and athroughput of16, while one \PFused Multiply-Add\Q instruction (see section) has alatency of4 and athroughput of0.5. Similar kinds of overhead are observed for other coefficient types such as intervals and balls>. Of course, in order to apply our work to speed up the numerical integration of(), one should use an ODE solver that is able to exploit the lower evaluation cost of >, while not suffering too much from the increased dimension n>. This is typically the case for Runge\UKutta integrators, for which the complexity depends linearly on the evaluation cost of >. Lazy Taylor series integrators implicitly use something close to polynomialization for the evaluation of > ( exponentials > are evaluated by solving the equation =f*g>). On the other hand, integrators that also compute the first variation ( in order to improve the stability) may suffer from the increased dimension. Finally, we plan to use polynomialization to simplify bound computations for certified Runge\UKutta ODE solvers as in. Such bound computations are easier and tighter for SLPs that only use ring operations. Decreasing the degrees of the polynomials> is also beneficial for this application. Polynomialization is certainly part of the folklore of differential algebra. Recent studies focus on quadratization, a polynomialization =,\,\|)>> with deg \\2>. Quadratization was established at least more than a century ago: historical and recent references, along with various applications, can be found in. Polynomialization is achieved in quadratic time in for a dense representation of polynomials. In the problem of finding the minimal number of extra variables necessary for quadratization is shown to be NPhard. In a quadratization with minimal order (namely >) is achieved for the dense representation of >. Polynomializations and quadratizations are not unique, and preserving stability is sometimes more important than the sole complexity issue; see various strategies in. Our first contribution is a new method for polynomializing ODEs represented by SLPs at the price of increasing the number of instructions by a small constant factor. The precise overhead depends on the arithmetic instructions allowed by the SLP framework. Theorem focuses on SLPs that only use additions, subtractions, and products, while Theorem achieves a lower overhead if we also allow \PFused Multiply-Add\Q instructions. Our second contribution concerns practical improvements over our worst case complexity bound. On the one hand, we restrict the additional costs to the subexpressions actually involved in the instructions of > which are not polynomials. On the other hand, we try to minimize the number of extra variables, namely >. Our optimizations and heuristics take linear time up to logarithmic factors. Even when polynomializing just afew instructions of a small SLP, our method turns out to of significant practical interest. Our third contribution is an implementation of our polynomialization algorithm within the library, which is a free software library for HPC (High-Performance Computing) computations with SLPs. Given \>, we write \,n-1|}>>. Given two tuples ,\,a|)>> and ,\,b|)>>, their ,\,a,b,\,b|)>> is written b>. We will encode names of variables by integers (typically of bit or bit machine size). For complexity analyses, we will assume that these integers are of size > and count unit time for operations on such integers. Similarly, we assume that elements of > take unit size. (Here we note that our main polynomialization algorithms from Theorems and require no arithmetic operations in>.) Below, we follow the SLP framework of. Let > be aset of operation symbols together with a function |\|>\\\\>. We call > a and|\|>> the of >, for any operation \\>. A with signature > is aset> together with afunction >\\|\|>>\\>> for every \\>. We often write > instead of>> if no confusion can arise. Let > be a domain with signature >. An is a tuple <\equation*> \=,a,\,a|\|>>|)> with >\\\\> and ,\,a|\|>>\\>, where |\|>\|\|>>. We call > the or and ,\,a|\|>>> the or . We denote by >> the set of such instructions. Given \>, we also let <\equation*> \,n>\,a,\,a|\|>>|)>\\>\\\\,a,\,a|\|>>\\|}>. A () over > is a quadruple >, where <\itemize> ,\,D-1>|)>\\>> is a tuple of data fields, ,\,I-1>|)>\\>>> is a tuple of pairwise distinct input locations, ,\,O-1>|)>\\>>> is a tuple of output locations, ,\,P-1>|)>\\,>>> is a tuple of instructions. We regard >> as the for our SLP. Each location in this working space can be represented using an integer in >> and instructions directly operate on the working space as follows. Given a \\>> of our working space, the of an instruction =,a,\,a|\|>>|)>\\,>> gives rise to a new state \\|)>> where \\> if a> and \\>>,\,\|\|>>>|)>> if >>. Given ,\,x>|)>\\>>, the corresponding begin state > of our SLP is given by \D> if ,\,I-1>|}>> and >\x> for \>>. Execution of the SLP next gives rise to a sequence of states \P|)>,\,\;>\P-1>-1;>|)>>. The ,\,y>|)>\\>> of our SLP are now given by \\;O>> for \>>. In this way our SLP gives rise to a function >\\>;x\y> that we will also denote by. Two SLPs are said to be if they compute the same function. It will be useful to call elements of >> and denote them by more intelligible symbols like ,\,v-1>> whenever convenient. The entries of and are called and , respectively. Non-input variables that also do not occur as destination operands of instructions in are called . An is avariable that is neither an input variable, nor an output variable, nor a constant. Given avariable \>>>, the corresponding is >. Input variables, output variables, and constants naturally correspond to , , and . For a subset \\>, we denote >\\>\\>\\|}>|\|>>; given \\>, we also abbreviate >\|}>>>. <\example> Let \\> and let =|}>> be the signatures of the binary sum and product. Let > be the SLP defined by >, >, >, and ,,0,0,0|)>>|)>>. With ,v,v> representing the values at positions in the working space, the instructions of rewrite into \v+2>, \v>. Consequently,> computes the \+2|)>>>. When dealing with operations that are only partially defined, such as the unary inverse over fields, we say that an SLP is at a given input when all its instructions have their arguments in their definition domain during the execution. <\remark> The bit sizes of the integers in and and in the instructions of are at most |)>>. For practical software implementations it is fair to consider that we always have \2>> or even \2>, so that these integers can be represented by hardware integers. This also \Pjustifies\Q why we count a unit cost for integer operations in our complexity analyses. The >, written >, will be the set of the following operations: <\itemize> unary identity >, defined by >:a\a>, unary negation , defined by >:a\-a>, binary subtraction, also written , defined by >:\a-b>>, binary addition , defined by >:\a+b>, binary multiplication >, defined by >:\a\b>. The > will be the union \\\\>, where > is the set of the following operations: <\itemize> unary inversion >, defined by >:a\a>, binary division , defined by >:\a/b>. Let > be a fixed finite set of operations > of arity one such that each \\> satisfies a differential equation <\equation*> ,x,\,x>>|)>=Q>,x,\,x>>,t|)>, where >:\>+1>\\> is given by an SLP with signature > over>. We also assume that >:t\,x,\,x>>|)>> is given by an SLP with signature \\> over>. In this case, we call > a field with >. For convenience, >> and >> will be rewritten in terms of macro instructions (following the usual programming language terminology). Precisely, we will assume given the following data: <\itemize> A segment of data > over > gathers all the constants and auxiliary variables involved in >> and >> for all \\>. For all \\>, a macro instruction >> takes >+1|)>>> integer arguments and returns a sequence of instructions of length written >|\|>>. The last argument of >> stands for the position of > in the working space. For any \\|\|>>>> and for all ,\,a>>,b,\,b>+1>> in |\|>>>, the execution of >,\,a>>,b,\,b>+1>,|\|>|)>> at the state \D> performs >,\,v>>>|)>>\Q>>,\,v>+1>>|)>> and only modifies the values at positions ,\,a>>> inside > (the data fields in > may be modified freely). For all \\>, a macro instruction >> takes >+2> integer arguments and returns a sequence of instructions of length written >|\|>>. Again, the last argument of >> is the position of > in the working space. For any\\|\|>>>> and all ,\,a>>,b> in |\|>>>, the execution of >,\,a>>,b,|\|>|)>>> at the state \D> performs >,\,v>>>|)>\q>>|)>> and only modifies the values at positions ,\,a>>> inside > (the data fields in > may be modified freely). <\example> The function may be included in > with the defining functions ,t|)>\|)>> and \>. We may take ,b,b,|\|>|)>\>\v>|)>> and ,b,|\|>|)>\>\exp v>|)>>. <\example> The function may be included in > with the defining functions ,x,t|)>\,-x|)>> and \>. We may take ,a,b,b,b,|\|>|)>>=>\v>,v>\v>*v>,v>\-v>|)>> and ,a,b,|\|>|)>=|\|>+c>\v>,v>\log v|\|>+c>,>\v|\|>+c>>|)>>, where stands for the position of an auxiliary variable in >. <\example> For any fixed \>, the fractional power function :t\t> may be included in > with the defining functions >,x,t|)>\*x,-x|)>> and >\,1/t|)>>. If the constant is stored at position > in > and if stands for the position of an auxiliary variable in >, then we may take >,a,b,b,b,|\|>|)>=|\|>+c>\v>,>\v>*v>>,>\v|\|>+c>*v>>,v>\v|\|>+c>*v|\|>+c>,v>\-v>|)>> and >,a,b,|\|>|)>=|\|>+c>\v>,v>\v|\|>+c>,>\v|\|>+c>>|)>>. An SLP :\\\> with signature > may be transformed into an equivalent SLP with at most divisions: one for each output. (In fact, one may even reduce to the case when we perform just one division, but this will not be needed in what follows.) Let us show how to compute an SLP =,,,|)>> for evaluating polynomials ,B,\,A,B> such that |B>,\,|B>|)>> and none of the > vanishes at the set \\> of points where is executable. Roughly speaking, we obtain the > and> by rewriting elements of > as fractions, which are encoded as pairs of (not necessarily coprime) numerators and denominators. This rewriting process is well known and takes linear time. It is a special instantiation of the lifting algorithm from5.5>. We now describe > more precisely and bound its size. We define the following tuples: <\itemize> \,\,>|)>\\+1>> is defined by \D> and \1>> for \>>. The last entry >> is an auxiliary variable which may be filled arbitrarily. An entry at position 2*> represents a numerator whose corresponding denominator is at position>; \,\,2*I-1>|)>>; \,2*O+1,\,2*O-1>,2*O-1>+1|)>>. The entries > and +1> represent the values of > and > for ,n>; \>,-1> P>, where > maps instructions of to sequences of instructions, as follows, where we write \v> ,\v>, and v>> as shorthands: <\eqnarray*> \v|)>>|>|\V,W\W|)>>>|\-v|)>>|>|\-V,W\W|)>>>|\v-v|)>>|>|V*W,V\V*W,V\A-V,W\W*W|)>>>|\v+v|)>>|>|V*W,V\V*W,V\A+V,W\W*W|)>>>|\v*v|)>>|>|\V*V,W\W*W|)>>>|\v|)>>|>|V,V\W,W\A|)>>>|\v/v|)>>|>|V,V\V*W,W\W*A|)>.>>>> For example, with > representing the value at position in the working space, the > of a sum \v+v> consists in computing <\equation*> |W>+|W>=*W+V*W|W*W>. We observe that |~>> is executable over >, that none of the > vanishes at > for ,n>, and that <\equation> |\|>\4*. <\example> Let 1> and let > be the SLP defined by >>, >, >, and ,|)>>. With ,v,v> representing the values at positions of the working space, the instructions of > rewrite into \v+2>, =v/v>. Consequently, >computes the rational function /+2|)>>. Applying the above construction gives us \>, \>, \>>, and > corresponds to the following sequence of instructions: \v*v>, \2*v>, \v+v>, \v*v>, \v>, \v*v>, \v*v>. The expressions of the output values at positions and are *v*v> and **v+2*v|)>>. So we have \X> and \X+2>. <\example> Let 1> and let > be the SLP defined by >, >, >, and \v/v,v\v|)>>. Although the first instruction is useless, > is not executable at . On the other hand, we have =X> and =1>. This shows that *\*B> may be inverted over a domain strictly larger than >. <\remark> In absence of \Paliasing\Q, the lifting process can be further optimized. For instance, if j>, then one may take \v|)>>\\W,W\V|)>>. Our implementation in JIL systematically exploits such optimizations; see also5.5>. An SLP :\\\> with signature > may be transformed into an SLP =,,,|)>> over > with signature > which evaluates over |]>/|)>|)>>, where > is a new variable. We define the following tuples: <\itemize> \,\,>|)>\\+1>> is defined by \D> and \0>> for \>>. The last entry >> is an auxiliary variable which may be filled arbitrarily. A variable > for becomes +v*\> for >. \,2*I+1,\,2*I-1>,2*I-1>+1|)>>; \,2*O+1,\,2*O-1>,2*O-1>+1|)>>; \>,-1> P>, where > maps instructions of to sequences of instructions, as follows, where we use \v> >\v>, and v>> as shorthands: <\eqnarray*> \v|)>>|>|\V,>\>|)>>>|\-v|)>>|>|\-V,>\->|)>>>|\v-v|)>>|>|\V-V,>\>->|)>>>|\v+v|)>>|>|\V+V,>\>+>|)>>>|\v*v|)>>|>|V*>,>\>*V,>\A+>,V\V*V|)>.>>>> Consequently, <\equation> |\|>\4*. We begin this section with a rather natural method for polynomializing SLPs over > with signature >. Then, we present our new faster and general approach. Let =|B>,\,|B>|)>> where \\,\,X|]>> and \\,\,X|]>>> for ,n>, and let ,\,x|)>> be a solution of defined over . We introduce the functions <\eqnarray*> >|>|>>|>|>|,\,x|)>>>>> over , for ,n>, and obtain <\eqnarray*> >|| A,\,x|)>,\,x|)>*x-A,\,x|)>*\ B,\,x|)>,\,x|)>*x>>|>|| B,\,x|)>,\,x|)>*x.>>>> Letting <\eqnarray*> >|>| A,\,X|)>,\,X|)>>>|>|>| B,\,X|)>,\,X|)>,>>>> the polynomial map <\equation*> \,\,X|)>\>>|>>|>>|*X-A*F*X>>|>>|*X-A*F*X>>|*X>>|>>|*X>>>>> is a polynomialization of >, since <\equation*> ,\,x|)>=\,\,x|)>. The computation of > can be done efficiently as follows. From the SLP for> with signature > we first determine an SLP|~>> with signature > which computes ,\,A,B,\,B>. The number of instructions of |~>> is 4*> using. By computing |~>+\*X,\,X+\*X|)>> modulo > we next build another SLP for evaluating >, >, >, and > for ,n> using at most |\|>> instructions, by. This finally leads to an SLP of length at most <\equation> 16*+6*n for the polynomialization of >. As to the initial conditions, we take: <\eqnarray*> |)>,\,x|)>|)>>|||)>>>||)>,\,x|)>|)>>|||)>,\,B|)>|)>.>>>> This requires 4*+n> ring operations plus inversions. \ In the previous subsection, we reduced the polynomialization of> to the case when it is explicitly given by rational functions. In this subsection, we present a new faster and more general method, which directly transforms all divisions and special functions of the SLP of >. Roughly speaking, a special instruction \> gives rise to >> new unknown functions ,\,u>>> with \\> and which satisfy the differential equation <\equation*> ,\,u>>|)>=v*Q>,\,u>>,v|)>. Our method is detailed in the proof of the following theorem, which improves upon the above polynomialization with cost. <\theorem> Let :\\\> be given by an SLP > over> with signature \\> and let >\1>, \1>, and <\equation*> \\p,\=\>\,/|}>\\>\>. We can compute the following SLPs over > in time ++n|)>>: <\itemize> ,,,|)>> uses > and represents a polynomialization > of> such that |\|>=2*+2*\>+|\|>+1>, |\|>=|\|>=2*n+\>>, and <\eqnarray*> |\|>>|>|>+7*,/|}>>+=\>\\>>|\|>+\>+2|)>.>>>> >,>,>,>|)>> uses \\>, computes the initial value ,\|)>>> for > from an initial value > for > at >, and satisfies >|\|>=+n+\>+|\|>>, >|\|>=n>, >|\|>=2*n+\>>, and >|\|>=+n+\>>. <\proof> The tuples ,,,> are defined as follows: <\itemize> \\+2*\>+|\|>+1>> with \D>, \0> for \>>, +2*\>+i>\D> for \|\|>>>. Other entries of > may be filled arbitrarily; \2*I\\,2*+2,\,2*+2*>-1|)>|)>>; \2*O\\+1,\,2*+2*>-1|)>+1|)>>; \>,-1> P>, where > maps instructions of to tuples of instructions with signature >. If > has type> then we abbreviate \v+i>>>, \v+i>>> and let \,\,\>-1>|)>>, \,\,\>-1>|)>>. We also abbreviate \v>, >\v>, v|\|>>>. The value of >> will actually be the derivative of the value of >, and will be an auxiliary variable. If \\>\\>, then we take <\eqnarray*> \v|)>>|>|\V,>\>|)>>>|\-v|)>>|>|\-V,>\->|)>>>|\v+v|)>>|>|\V+V,>\>+>|)>>>|\v-v|)>>|>|\V-V,>\>->|)>>>|\v*v|)>>|>|V*>,>\>*V,>\A+>,V\V*V|)>.>>>> If \\>, then we define <\eqnarray*> \v|)>>|>|\\,\\->,\\\*V,\\\*V,>\\|)>>>|\v/v|)>>|>|\->,\\\*\,\\\*\,\|\>>>|||>>A\V*\,>\>*\,>\A+>,V\V*\||)>.>>>> If \\>, then we take <\eqnarray*> \\|)>|)>>|>|\\|)>>>|||\>>,\,+\>-1>|\>,>>|||>,\,+\>-1>,>,\,+\>-1>,2*j,2*+2*\>|)>>>|||\>*\,\,\>-1>\>*\>-1>|)>>>|||>\\|)>.>>>> Let us now prove that ,,,|)>> represents a polynomialization of >. We first need to introduce appropriate notation. For this, let ,\,x|)>\\> be a solution of(), where > is the space of differentiable functions of the interval \> to >. Now consider the evaluation of > at ,\,x|)>> over >, as described in section, but with the modification that we introduce new function variables +1>,\,x+\>>\\> whenever we execute an instruction> with \\>\>>. Let \\>> be the current state of the working space just before the execution of >. If =\v|)>>>, then we define +1>\\>. If =\v/v|)>>, then we take +1>\\>. If =\\|)>|)>> with \\>, then we define <\equation*> +1>,\,x+\>>|)>\q>|)>. For ,n>, we also define \x> and we claim that <\equation*> ,\,x>>|)>=\,\,x>>|)>. We now evaluate ,\,x>>|)>> over >. Let \\|\|>>>> be the state just before the execution of P>. It will be convenient to abuse notation and use as an abbreviation of the current state> in addition to the usual semantics as a tuple of variables. We claim that we always have =v>, =>>, for all \>>. Just after copying the input values into the working space (and assuming that all auxiliary variables are initialized with zeros), the claim clearly holds. Assuming that the claim holds before the execution of P>, we need to show that it still holds afterwards. If =\\|)>|)>> or =\\,v|)>|)>> with \\>, then P> is the same operation > with arguments +>*\>, +>*\>, and possibly +>*\> in |]>/|)>>. Since belong to >>, the claim holds after the execution of P>. If =\v|)>>, then P> corresponds to computing <\eqnarray*> >|>|>*\>>|+>*\>|>|+\*\.>>>> By definition, we have =V> hence =|)>>. Therefore =\> after the execution, as claimed. If =\v/v|)>>, then P> corresponds to computing <\eqnarray*> >|>|>*\>>|+>*\>|>|*\+*\+>*\|)>*\.>>>> From =V> we obtain =/V|)>>. Hence =\> and *\+>*\=*\|)>> after the execution, as claimed. If =\\|)>|)>> with \\>, then P> computes <\eqnarray*> >|>|>*Q>,V|)>>>|+>*\>|>|+\*\.>>>> Consequently, =\> and our claim again holds after the execution of P>. This completes the proof of the first assertion. Let us now turn to the second assertion about the computation of the initial value. The tuples >,>,>,>> are defined as follows: <\itemize> >\\+n+\>+|\|>>> with >\D> for \>> and >+n+\>+i>\D> for \|\|>>>. Other entries of >> may be filled arbitrarily; >\I>; >\,\,+n-1|)>\O\+n,\,+n+\>-1|)>>; >\>\v>,\,v+n-1>\v>|)>\>\>> P>, where > is a function which maps instructions of to tuples of instructions as follows. If >\\>, then we take P\|)>>. If =\v|)>>, then <\eqnarray*> P>|>|>>>\v,v\v>>>|)>.>>>> If =\v/v|)>>, then <\eqnarray*> P>|>|>>>\v,v\v*v>>>|)>.>>>> If =\\|)>|)>> with \\>, then <\eqnarray*> P>|>|>>>,\,>+\>-1>,j,+n+\>|)>\\v>>>|)>.>>>> Let ,\,x|)>\\> be a solution of for the initial condition ,\,x|)>|)>=\>> and let ,\,x>>> be defined as above. Let :\>\\|\|>>> stand for the map defined by the SLP >,>,>,>|)>>. By construction, |)>=\\\|)>\,\,x>>|)>|)>>. As for the complexity, we note that >=O|)>> and that |\|>> is a constant. The constructions of ,,,> and >,>,>,>> take linear time. <\remark> In more precise complexity models such as computation trees, RAM machines, or many-tape Turing machines, the complexity bound of Theorem becomes ++n|)>*log++n|)>|)>>, due to the logarithmic space overhead for storing the indices of variables. On the other hand, the algorithm behind Theorem uses no operations in>, no hash tables, and no other complex operations on integers. Modern computers often support an efficient so-called \PFused Multiply-Add\Q (FMA) instruction set over floating point numbers in single and double precisions. For Intel> and ARM> CPUs and also NVidia> GPUs, these instructions compute expressions of the form a*b\c> as fast as a single product. This motivates us to define the > for a ring > as >, together with the following operations: <\itemize> the fused multiply-add, defined by >:\a*b+c>, the fused multiply-sub, defined by >:\a*b-c>, the fused negative multiply-add, defined by >:\-a*b+c>, the fused negative multiply-sub, defined by >:\-a*b-c>, the negative multiply, defined by >:\-a*b>. We revisit Theorem with FMA signatures. <\theorem> By allowing the SLPs for ,\,\> to be of signature>, Theorem holds with the sharper bound <\eqnarray*> |\|>>|>|>+5*,/|}>>+=\>\\>>|\|>+\>+2|)>.>>>> <\proof> The construction of the SLPs extends the one presented in the proof of Theorem, by taking FMA operations into account. On the one hand, we define <\eqnarray*> \v*v|)>>|>|V*>,>\>*V+A,V\V*V|)>>>|\-v*v|)>>|>|V*>,>\->*V-A,V\-V*V|)>>>|\v*v+v|)>>|>|V*>+>,>\>*V+A,V\V*V+V|)>>>>> and similarly for >, >, and >, hence the contribution >> in the complexity bound. One the other hand, we redefine <\eqnarray*> \v|)>>|>|\\,\\->*V,\\\*V,>\\|)>>>>> and <\eqnarray*> \v/v|)>>|>|\->*\,\\\*\,\|\>>>|||>>A\V*\,>\>*\+A,V\V*\||)>,>>>> hence the contribution ,/|}>>> in the complexity bound. <\remark> In fact, the factor in >> can often be further reduced. More precisely, let > be the number of ring nodes of whose ancestors are also all ring nodes. Then it can be verified that (after simplification of the SLP) the term >> can be replaced by >+3*>-|)>>. We implemented Theorem within the library (the present paper corresponds to version ). Currently, we support FMA instructions and > consists of all elementary functions , , >, , , , , , , , , , , , . is a free software library for HPC computations with SLPs. It provides a convenient interface for automatic differentiation, common sub-expression elimination, lifting, transposition, and much more. It also comes with a (Just In Time) compiler dedicated to SLPs. This compiler can generate executable machine code directly in memory, which is useful when the same SLP needs to be evaluated 1>> times in an efficient manner. The JIT compiler of has the advantage of being one or two orders of magnitude faster than general purpose compilers such asGCC or CLANG. Consequently, JIT compilation becomes beneficial for relatively small values of like 1000>. In addition, if we have several competing SLPs for the same task, then we may efficiently generate machine code for each of them and empirically select thebestSLP. The source code of our polynomialization can be found in the file of . Examples are in the sub-directory. Usual simplifications of SLPs in are done using the function (implemented in the file ): it folds constants, removes unneeded computations, and shares common subexpressions. The expected running time is linear, under the standard assumption that underlying hash tables are well balanced. Operations in > are only performed during constant folding4>. Therefore, the number of operations in > never exceeds the number of instructions of the SLP under simplification. Roughly speaking, the overhead in the term >> of is due to evaluating> over |]>/|)>>. In order to decrease this overhead, we first present an optimization which aims at reducing the number of first order derivatives> used by >: we will separate ring instructions needed by divisions and special functions and other operations. <\example> Consider the system <\equation*> x=x,x=x->. We represent it using the following SLP with signature >: \>> with \1>, >, >, and \v*v+1,\v>,v\v*v-v|)>>. The direct application of Theorem leads to an SLP for a polynomialization > with inputs and outputs and instructions, which drops to after simplification. It turns out that this SLP is suboptimal since we may simply define \|)>> and obtain the following shorter : <\equation> x=x,x=x-x,x=-2*x*x*x, which can be represented by an SLP with 5 instructions. In the above example, the second derivatives of > and > are not needed, because > only depends on >, whose first derivative is available in the SLP of > before computing >. We may compute the shorter polynomialization in a more systematic manner using the following optimization. <\optimization> Consider the SLP of >. If \\,\,x|)>> is computed before an instruction > with >\\> depends on >, then we may modify the polynomialization> constructed in Theorems and as follows: <\itemize> Just after ,\,x|)>> is computed, we copy its value into the position of > in the workingspace; The output corresponding to > is discarded in the SLP of >. In order to implement Optimization efficiently, we need to solve the following problem: for a given subset > of instructions of the SLP of > (namely the > with >\\>) and for each variable>, we wish to determine the first instruction in > that depends on >. This is a common task for compilers, which can be achieved in linear time using one forward and one backward pass through the SLP. More precisely, during the forward pass, one maintains an array that, for each variable >, indicates the (index of the) last instruction that modifies >. This information is used during the same pass to build an array that, for each argument >> of an instruction =\\>,\,v|\|>>>|)>|)>>, indicates the latest instruction that modified>>. During the backward pass, for each instruction =\\>,\,v|\|>>>|)>|)>>> and each input variable >, we determine the first instruction in > that depends on >. <\remark> Permuting independent instructions of the SLP might allow to reduce the dependencies and apply Optimization more aggressively. However, finding the best permutation might be rather expensive, so we left this issue for future investigation. <\example> If |)>=X>, then Optimization yields the simplified polynomialization ,X|)>=,-X|)>> in terms of the SLP defined by \\>, \>, \>, and \\v*v,v\-v*v|)>>. <\example> If ,X|)>=,X/X|)>>, then Optimization yields the simplified polynomialization ,X,X|)>=,X*X,-X*X|)>> in terms of a SLP with instructions. <\example> If we swap the variables and equations of Example, then ,X|)>=/X,X|)>> and Optimization yields the simplified polynomialization ,X,X,X|)>=*X,X,X*X,-X*X|)>> which takes instructions, but > remains an input of >. This example motivates our next optimization. <\optimization> Let the notation be as in Theorem. If ,n|}>>> and ,2*n+\>|}>>> are such that =\> and =\> then we may simplify the polynomialization into |~>,\,X,X,\,X>>|)>>\,\,\1>,\1>,\,\>>|)> ,\,X1>,X,X1>,\,X>>|)>> and |~>,\,X|)>\,\,\,\,\,\>>|)>>. <\example> In Example the values of > and > coincide. As for the initial conditions, we have ,X,T|)>\/X,X,X|)>>. Following Optimization, we may simplify the polynomialization into |~>,X,X|)>=*X,X,-X*X|)>>. In order to minimize the number of extra variables in our polynomializations, we may compute the values of ,\,x> using > and ,\,x>>> before evaluating >. For sure, we obtain a polynomialization with only >> variables, still in linear time, which is in general higher than the one obtained in Theorems and. Nevertheless, sometimes and thanks to the simplification routine of , this approach turns out to be useful. The construction summarizes as follows. <\optimization> Let the notation be as in Theorem and its proof. A polynomialization of > with >> inputs can be obtained as follows: <\enumerate> Compute an SLP |~>> over > with signature in > which takes ,\,x,x,\,x>>> as input and returns ,\,x|)>>. Construct the SLP of the polynomialization |~>> which takes ,\,x>,x,\,x>>> as input and returns ,\,\,\,\,\>>|)>,\,x,|~>,\,x,x,\,x>>|)>,\,|~>,\,x,x,\,x>>|)>,x,\,x>>|)>>. As for the initial conditions for |~>>, construct the SLP for |~>,\,x|)>>\|~>,\,|~>,|~>,\,|~>>>|)>,\,x|)>>. In our implementation, we compare the SLP resulting of Optimizations and with the one of Optimizations and: we return the one which has the smallest number of instructions. In case of equality, we return the one with the smallest number of variables.\ <\example> Optimization applied to Example yields a simplified polynomialization with 3 inputs and 2 instructions. <\example> Let (X)=1+1/(1+X)> and \\\\\\>. Evaluating > takes 12 instructions. Optimizations and yield a polynomialization with 15 instructions, while Optimizations and yield 16 instructions. We have shown how the polynomialization of an ODE given by an SLP can be achieved in linear time. Combined with suitable optimizations, our implementation returns efficient ODEs, especially for numerical integration. Quadratization seems to be a harder problem for SLPs. A first straightforward approach is to expand the SLP into a sparse polynomial and apply the algorithms from. However, this approach destroys the SLP structure. In particular, occasional products of linear forms are all expanded. We expect that more natural quadratizations exist for SLPs. Finding such SLPs with a better complexity is an interesting problem for future investigation. <\bibliography|bib|tm-plain|> <\bib-list|17> A.Ahlbäck, J.vander HoevenG.Lecerf. JIL: a high performance library for straight-line programs. , 2025. J.Alexandre Dit SandrettoA.Chapoutot. Validated explicit and implicit Runge\UKutta methods. , 22:79\U113, 2016. P.Bürgisser, M.ClausenM.A.Shokrollahi. , 315. Springer-Verlag, 1997. J.C.Butcher. . Wiley, Chichester, third, 2016. A.Bychkov, O.Issan, G.PogudinB.Kramer. Exact and optimal quadratization of nonlinear finite-dimensional nonautonomous dynamical systems. , 23(1):982\U1016, 2024. Y.CaiG.Pogudin. Dissipative quadratizations of polynomial ODE systems. B.FinkbeinerL.Kovács, , 14571, 323\U342. Springer, Cham, 2024. J.vonzur GathenJ.Gerhard. . Cambridge University Press, New York, 3rd, 2013. M.Hemery, F.FagesS.Soliman. On the complexity of quadratization for polynomial differential equations. A.Abate, T.PetrovV.Wolf, , 12314, 120\U140. Springer, Cham, 2020. M.Hemery, F.FagesS.Soliman. Compiling elementary mathematical functions into finite chemical reaction networks via a polynomialization algorithm for ODEs. E.CinquemaniL.Paulevé, , 12881, 74\U90. Springer, Cham, 2021. J.vander Hoeven. Relax, but don't be too lazy. Symbolic Comput.>, 34:479\U542, 2002. J.vander Hoeven. . Scypress, 2020. J.vander HoevenG.Lecerf. Towards a library for straight-line programs. , HAL, 2025. . Accepted for publication in J.vander Hoeven, G.LecerfA.Minondo. Static bounds for straight-line programs. , HAL, 2025. . J.vander Hoeven etal. GNU TeXmacs. , 1998. B.KramerG.Pogudin. Discovering Polynomial and Quadratic Structure in Nonlinear Ordinary Differential Equations. 2502.10005, arXiv, 2025. . R.E.Moore, R.B.KearfottM.J.Cloud. . SIAM Press, 2009. C.H.Papadimitriou. . Addison-Wesley, 1994. <\initial> <\collection> <\attachments> <\collection> <\associate|bib-bibliography> <\db-entry|+1PlYSeg92FURz2kB|book|TeXmacs:vdH:book> <|db-entry> > <\db-entry|+1PlYSeg92FURz2kC|misc|TeXmacs:website> <|db-entry> > > <\db-entry|+1BJhfN8iozrOhPM|techreport|vdH:matryoshka> <|db-entry> G. A. > > <\db-entry|+qYhUkmR1lNMNv5V|book|MKC09> <|db-entry> R. B. M. J. > <\db-entry|+18U0BHQF1kOygxpl|book|Butcher2016> <|db-entry> > <\db-entry|+cJSkMhbTYXNoYJ|article|vdH:relax> <|db-entry> > Symbolic Comput.> <\db-entry|+2NgrrZ7Q2JC|article|SandrettoChapoutot2016> <|db-entry> A. > <\db-entry|+2NgrrZ7Q2JB|article|BychkovIssanPogudinKramer2024> <|db-entry> O. G. B. > <\db-entry|+1uqicm49ef3|inproceedings|CaiPogudin2024> <|db-entry> G. > L. > <\db-entry|+2NgrrZ7Q2J7|inproceedings|HemeryFagesSoliman2020> <|db-entry> F. S. > T. V. > <\db-entry|+2NgrrZ7Q2Iv|inproceedings|HemeryFagesSoliman2021b> <|db-entry> F. S. > L. > <\db-entry|+2NgrrZ7Q2J9|techreport|KramerPogudin2025> <|db-entry> G. > > <\db-entry|+UCHwtGN1sHSR59v|misc|JIL> <|db-entry> J. van der G. > > <\db-entry|+1uqicm49ef2|techreport|vdH:slp> <|db-entry> G. > . Accepted for publication in > <\db-entry|+2MBfjYgs1WYemMtm|book|BuClSh1997> <|db-entry> M. M. A. > <\db-entry|+2WheVMn2twB3YIJ|book|GathenGerhard2013> <|db-entry> J. > <\db-entry|+qYhUkmR1lNMNv77|book|Pap94> <|db-entry> > <\references> <\collection> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > <\auxiliary> <\collection> <\associate|bib> TeXmacs:vdH:book TeXmacs:website vdH:matryoshka MKC09 Butcher2016 vdH:relax SandrettoChapoutot2016 BychkovIssanPogudinKramer2024 CaiPogudin2024 HemeryFagesSoliman2020 HemeryFagesSoliman2021b KramerPogudin2025 HemeryFagesSoliman2020 HemeryFagesSoliman2021b HemeryFagesSoliman2020 BychkovIssanPogudinKramer2024 CaiPogudin2024 JIL vdH:slp vdH:slp vdH:slp vdH:slp BuClSh1997 GathenGerhard2013 Pap94 JIL vdH:slp vdH:slp BychkovIssanPogudinKramer2024 <\associate|toc> |math-font-series||font-shape||1.Introduction> |.>>>>|> |1.1.Motivation |.>>>>|> > |1.2.Related work |.>>>>|> > |1.3.Our contributions |.>>>>|> > |math-font-series||font-shape||2.Straight-line programs> |.>>>>|> |2.1.Definition |.>>>>|> > |2.2.Standard signatures |.>>>>|> > |2.3.Postponing divisions |.>>>>|> > |2.4.Tangent numbers |.>>>>|> > |math-font-series||font-shape||3.Polynomialization> |.>>>>|> |3.1.SLPs with divisions |.>>>>|> > |3.2.SLPs with special functions |.>>>>|> > |3.3.Exploiting FMA instructions |.>>>>|> > |math-font-series||font-shape||4.Implementation> |.>>>>|> |math-font-series||font-shape||5.Conclusion> |.>>>>|> |math-font-series||font-shape||Bibliography> |.>>>>|> <\links> <\collection> > > > > > > > > > >