Polynomialization of ordinary differential equations given by straight-line programs

In this paper,

will represent either

. Let

be an open subset of

, let

be a map

, we will consider the Ordinary Differential Equation (ODE)

where

stands for an unknown function on an open interval

. A polynomialization of

is a polynomial map

with

such that any solution

of (1) on

extends into a solution

which is also defined on

. We say that a solution

of (1) satisfies the initial condition

This paper is devoted to the fast computation of polynomializations of

, along with the corresponding initial conditions.

1.1.Motivation

ODEs arising in many areas of physics and chemistry are often non-linear and include divisions, exponentiations, logarithms, square roots, etc. Although methods for ODE polynomialization are well known, we are not aware of complexity bounds for the case where

is given by a Straight-Line Program (SLP for short, defined in section 2). Although numerical integrators typically allow

to be an arbitrary blackbox function, it is most common that the implementation of

is actually an SLP.

Our motivation is three-fold. From a theoretical point of view, we are interested in a sharp worst case complexity bound.

Polynomialization is of practical interest, because the quotients and special functions occurring in

are typically more expensive to evaluate than the simpler ring operations occurring in

. For example, when using a recent Intel^tm CPU with Skylake^tm architecture, an AVX512 division in double precision has a latency of 23 CPU cycles and a throughput of 16, while one “Fused Multiply-Add” instruction (see section 3.3) has a latency of 4 and a throughput of 0.5. Similar kinds of overhead are observed for other coefficient types such as intervals and balls [13, 16].

Of course, in order to apply our work to speed up the numerical integration of (1), one should use an ODE solver that is able to exploit the lower evaluation cost of

, while not suffering too much from the increased dimension

. This is typically the case for Runge–Kutta integrators [4], for which the complexity depends linearly on the evaluation cost of

. Lazy Taylor series integrators [10] implicitly use something close to polynomialization for the evaluation of

(e.g. exponentials

are evaluated by solving the equation

). On the other hand, integrators that also compute the first variation (e.g. in order to improve the stability) may suffer from the increased dimension.

Finally, we plan to use polynomialization to simplify bound computations for certified Runge–Kutta ODE solvers as in [2]. Such bound computations are easier and tighter for SLPs that only use ring operations. Decreasing the degrees of the polynomials

is also beneficial for this application.

1.2.Related work

Polynomialization is certainly part of the folklore of differential algebra. Recent studies focus on quadratization, i.e. a polynomialization

with

. Quadratization was established at least more than a century ago: historical and recent references, along with various applications, can be found in [5, 6, 8, 9, 15]. Polynomialization is achieved in quadratic time in [8, 9] for a dense representation of polynomials. In [8] the problem of finding the minimal number of extra variables necessary for quadratization is shown to be NP-hard. In [5] a quadratization with minimal order (namely

) is achieved for the dense representation of

. Polynomializations and quadratizations are not unique, and preserving stability is sometimes more important than the sole complexity issue; see various strategies in [6].

1.3.Our contributions

Our first contribution is a new method for polynomializing ODEs represented by SLPs at the price of increasing the number of instructions by a small constant factor. The precise overhead depends on the arithmetic instructions allowed by the SLP framework. Theorem 9 focuses on SLPs that only use additions, subtractions, and products, while Theorem 11 achieves a lower overhead if we also allow “Fused Multiply-Add” instructions.

Our second contribution concerns practical improvements over our worst case complexity bound. On the one hand, we restrict the additional costs to the subexpressions actually involved in the instructions of

which are not polynomials. On the other hand, we try to minimize the number of extra variables, namely

. Our optimizations and heuristics take linear time up to logarithmic factors. Even when polynomializing just a few instructions of a small SLP, our method turns out to of significant practical interest.

Our third contribution is an implementation of our polynomialization algorithm within the JIL library [1, 12], which is a free software C++ library for HPC (High-Performance Computing) computations with SLPs.

2.Straight-line programs

Given

, we write

. Given two tuples

and

, their concatenation

is written

. We will encode names of variables by integers (typically of

bit or

bit machine size). For complexity analyses, we will assume that these integers are of size

and count unit time for operations on such integers. Similarly, we assume that elements of

take unit size. (Here we note that our main polynomialization algorithms from Theorems 9 and 11 require no arithmetic operations in

.) Below, we follow the SLP framework of [12].

2.1.Definition

Let

be a set of operation symbols together with a function

. We call

a signature and

the arity of

, for any operation

. A domain with signature

is a set

together with a function

for every

. We often write

instead of

if no confusion can arise.

with operation

and arguments

, where

. We call

the destination argument or operand and

the source arguments or operands. We denote by

the set of such instructions. Given

, we also let

We regard

as the working space for our SLP. Each location in this working space can be represented using an integer in

and instructions directly operate on the working space as follows.

Given a state

of our working space, the execution of an instruction

gives rise to a new state

where

and

. Given input values

, the corresponding begin state

of our SLP is given by

and

for

. Execution of the SLP next gives rise to a sequence of states

. The output values

of our SLP are now given by

for

. In this way our SLP

gives rise to a function

that we will also denote by

. Two SLPs are said to be equivalent if they compute the same function.

It will be useful to call elements of

variables and denote them by more intelligible symbols like

whenever convenient. The entries of

and

are called input and output variables, respectively. Non-input variables that also do not occur as destination operands of instructions in

are called constants. An auxiliary variable is a variable that is neither an input variable, nor an output variable, nor a constant. Given a variable

, the corresponding data field is

. Input variables, output variables, and constants naturally correspond to input fields, output fields, and constant fields. For a subset

, we denote

; given

, we also abbreviate

Example 1. Let

and let

be the signatures of the binary sum and product. Let

be the SLP defined by

, and

. With

representing the values at positions

in the working space, the instructions of

rewrite into

. Consequently,

computes the function

When dealing with operations that are only partially defined, such as the unary inverse over fields, we say that an SLP is executable at a given input when all its instructions have their arguments in their definition domain during the execution.

Remark 2. The bit sizes of the integers in

and

and in the instructions of

are at most

. For practical software implementations it is fair to consider that we always have

or even

, so that these integers can be represented by hardware integers. This also “justifies” why we count a unit cost for integer operations in our complexity analyses.

2.2.Standard signatures

The standard signature of a ring

, written

, will be the set of the following operations:

The standard signature of a field

will be the union

, where

is the set of the following operations:

Let

be a fixed finite set of operations

of arity one such that each

satisfies a differential equation

where

is given by an SLP with signature

over

. We also assume that

is given by an SLP with signature

over

. In this case, we call

a field with special functions

For convenience,

and

will be rewritten in terms of macro instructions (following the usual programming language terminology). Precisely, we will assume given the following data:

Example 3. The

function may be included in

with the defining functions

and

. We may take

and

Example 4. The

function may be included in

with the defining functions

and

. We may take

and

, where

stands for the position of an auxiliary variable in

Example 5. For any fixed

, the fractional power function

may be included in

with the defining functions

and

. If the constant

is stored at position

and if

stands for the position of an auxiliary variable in

, then we may take

and

2.3.Postponing divisions

An SLP

with signature

may be transformed into an equivalent SLP with at most

divisions: one for each output. (In fact, one may even reduce to the case when we perform just one division, but this will not be needed in what follows.)

Let us show how to compute an SLP

for evaluating polynomials

such that

and none of the

vanishes at the set

of points where

is executable. Roughly speaking, we obtain the

and

by rewriting elements of

as fractions, which are encoded as pairs of (not necessarily coprime) numerators and denominators. This rewriting process is well known and takes linear time. It is a special instantiation of the lifting algorithm from [12, section 5.5].

We now describe

more precisely and bound its size. We define the following tuples:

For example, with

representing the value at position

in the working space, the

of a sum

consists in computing

We observe that

is executable over

, that none of the

vanishes at

for

, and that

Example 6. Let

and let

be the SLP defined by

, and

. With

representing the values at positions

of the working space, the instructions of

rewrite into

. Consequently,

computes the rational function

. Applying the above construction gives us

, and

corresponds to the following sequence of instructions:

. The expressions of the output values at positions

and

are

and

. So we have

and

Example 7. Let

and let

be the SLP defined by

, and

. Although the first instruction is useless,

is not executable at

. On the other hand, we have

and

. This shows that

may be inverted over a domain strictly larger than

Remark 8. In absence of “aliasing”, the lifting process can be further optimized. For instance, if

, then one may take

. Our implementation in JIL systematically exploits such optimizations; see also [12, section 5.5].

2.4.Tangent numbers

An SLP

with signature

may be transformed into an SLP

over

with signature

which evaluates

over

, where

is a new variable. We define the following tuples:

3.Polynomialization

We begin this section with a rather natural method for polynomializing SLPs over

with signature

. Then, we present our new faster and general approach.

3.1.SLPs with divisions

Let

where

and

for

, and let

be a solution of (1) defined over

. We introduce the functions

From the SLP for

with signature

we first determine an SLP

with signature

which computes

. The number of instructions of

using (3).

By computing

modulo

we next build another SLP for evaluating

, and

for

using at most

instructions, by (4). This finally leads to an SLP of length at most

3.2.SLPs with special functions

In the previous subsection, we reduced the polynomialization of

to the case when it is explicitly given by rational functions. In this subsection, we present a new faster and more general method, which directly transforms all divisions and special functions of the SLP of

. Roughly speaking, a special instruction

gives rise to

new unknown functions

with

and which satisfy the differential equation

Our method is detailed in the proof of the following theorem, which improves upon the above polynomialization with cost (5).

Let us now prove that

represents a polynomialization of

. We first need to introduce appropriate notation. For this, let

be a solution of (1), where

is the space of differentiable functions of the interval

. Now consider the evaluation of

over

, as described in section 2.1, but with the modification that we introduce new function variables

whenever we execute an instruction

with

. Let

be the current state of the working space just before the execution of

. If

, then we define

. If

, then we take

. If

with

, then we define

We now evaluate

over

. Let

be the state just before the execution of

. It will be convenient to abuse notation and use

as an abbreviation of the current state

in addition to the usual semantics as a tuple of variables. We claim that we always have

, i.e.

, for all

. Just after copying the input values into the working space (and assuming that all auxiliary variables are initialized with zeros), the claim clearly holds. Assuming that the claim holds before the execution of

, we need to show that it still holds afterwards.

with

, then

is the same operation

with arguments

, and possibly

. Since

belong to

, the claim holds after the execution of

Consequently,

and our claim again holds after the execution of

. This completes the proof of the first assertion.

Let us now turn to the second assertion about the computation of the initial value. The tuples

are defined as follows:

Let

be a solution of (1) for the initial condition

and let

be defined as above. Let

stand for the map defined by the SLP

. By construction,

As for the complexity, we note that

and that

is a constant. The constructions of

and

take linear time.

Remark 10. In more precise complexity models such as computation trees, RAM machines [3, 7], or many-tape Turing machines [17], the complexity bound of Theorem 9 becomes

, due to the logarithmic space overhead for storing the indices of variables. On the other hand, the algorithm behind Theorem 9 uses no operations in

, no hash tables, and no other complex operations on integers.

3.3.Exploiting FMA instructions

Modern computers often support an efficient so-called “Fused Multiply-Add” (FMA) instruction set over floating point numbers in single and double precisions. For Intel^tm and ARM^tm CPUs and also NVidia^tm GPUs, these instructions compute expressions of the form

as fast as a single product. This motivates us to define the FMA signature

for a ring

, together with the following operations:

Proof. The construction of the SLPs extends the one presented in the proof of Theorem 9, by taking FMA operations into account. On the one hand, we define

and similarly for

, and

, hence the contribution

in the complexity bound. One the other hand, we redefine

Remark 12. In fact, the factor

can often be further reduced. More precisely, let

be the number of ring nodes of

whose ancestors are also all ring nodes. Then it can be verified that (after simplification of the SLP) the term

can be replaced by

4.Implementation

We implemented Theorem 11 within the JIL library [1, 12] (the present paper corresponds to GIT version 58a6ba506f8216260f6b8e8d28cc4311ddcfd8e7). Currently, we support FMA instructions and

consists of all elementary functions

. JIL is a free software C++ library for HPC computations with SLPs. It provides a convenient interface for automatic differentiation, common sub-expression elimination, lifting, transposition, and much more. It also comes with a JIT (Just In Time) compiler dedicated to SLPs. This compiler can generate executable machine code directly in memory, which is useful when the same SLP needs to be evaluated

times in an efficient manner.

The JIT compiler of JIL has the advantage of being one or two orders of magnitude faster than general purpose compilers such as GCC or CLANG. Consequently, JIT compilation becomes beneficial for relatively small values of

. In addition, if we have several competing SLPs for the same task, then we may efficiently generate machine code for each of them and empirically select the best SLP.

The source code of our polynomialization can be found in the file src/ode/slp_ode_polynomialize.cpp of JIL. Examples are in the check/ode/ sub-directory. Usual simplifications of SLPs in JIL are done using the simplify function (implemented in the file src/transforms/slp_simplify.cpp): it folds constants, removes unneeded computations, and shares common subexpressions. The expected running time is linear, under the standard assumption that underlying hash tables are well balanced. Operations in

are only performed during constant folding [12, section 4]. Therefore, the number of operations in

never exceeds the number of instructions of the SLP under simplification.

Roughly speaking, the overhead

in the term

of (6) is due to evaluating

over

. In order to decrease this overhead, we first present an optimization which aims at reducing the number of first order derivatives

used by

: we will separate ring instructions needed by divisions and special functions and other operations.

We represent it using the following SLP with signature

with

, and

. The direct application of Theorem 11 leads to an SLP for a polynomialization

with

inputs and outputs and

instructions, which drops to

after simplification. It turns out that this SLP is suboptimal since we may simply define

and obtain the following shorter polynomialization:

In the above example, the second derivatives of

and

are not needed, because

only depends on

, whose first derivative is available in the SLP of

before computing

. We may compute the shorter polynomialization in a more systematic manner using the following optimization.

In order to implement Optimization 14 efficiently, we need to solve the following problem: for a given subset

of instructions of the SLP of

(namely the

with

) and for each variable

, we wish to determine the first instruction in

that depends on

. This is a common task for compilers, which can be achieved in linear time using one forward and one backward pass through the SLP.

More precisely, during the forward pass, one maintains an array that, for each variable

, indicates the (index of the) last instruction that modifies

. This information is used during the same pass to build an array that, for each argument

of an instruction

, indicates the latest instruction that modified

. During the backward pass, for each instruction

and each input variable

, we determine the first instruction in

that depends on

Remark 15. Permuting independent instructions of the SLP might allow to reduce the dependencies and apply Optimization 14 more aggressively. However, finding the best permutation might be rather expensive, so we left this issue for future investigation.

Example 16. If

, then Optimization 14 yields the simplified polynomialization

in terms of the SLP defined by

, and

Example 17. If

, then Optimization 14 yields the simplified polynomialization

in terms of a SLP with

instructions.

Example 18. If we swap the variables and equations of Example 17, then

and Optimization 14 yields the simplified polynomialization

which takes

instructions, but

remains an input of

. This example motivates our next optimization.

Example 20. In Example 18 the values of

and

coincide. As for the initial conditions, we have

. Following Optimization 19, we may simplify the polynomialization into

In order to minimize the number of extra variables in our polynomializations, we may compute the values of

using

and

before evaluating

. For sure, we obtain a polynomialization with only

variables, still in linear time, which is in general higher than the one obtained in Theorems 9 and 11. Nevertheless, sometimes and thanks to the simplification routine of JIL, this approach turns out to be useful. The construction summarizes as follows.

In our implementation, we compare the SLP resulting of Optimizations 14 and 19 with the one of Optimizations 21 and 19: we return the one which has the smallest number of instructions. In case of equality, we return the one with the smallest number of variables.

Example 22. Optimization 21 applied to Example 18 yields a simplified polynomialization with 3 inputs and 2 instructions.

Example 23. Let

and

. Evaluating

takes 12 instructions. Optimizations 21 and 19 yield a polynomialization with 15 instructions, while Optimizations 14 and 19 yield 16 instructions.

5.Conclusion

We have shown how the polynomialization of an ODE given by an SLP can be achieved in linear time. Combined with suitable optimizations, our implementation returns efficient ODEs, especially for numerical integration.

Quadratization seems to be a harder problem for SLPs. A first straightforward approach is to expand the SLP into a sparse polynomial and apply the algorithms from [5]. However, this approach destroys the SLP structure. In particular, occasional products of linear forms are all expanded. We expect that more natural quadratizations exist for SLPs. Finding such SLPs with a better complexity is an interesting problem for future investigation.

Bibliography

[1]: A. Ahlbäck, J. van der Hoeven, and G. Lecerf. JIL: a high performance library for straight-line programs. https://sourcesup.renater.fr/projects/jil, 2025.
[2]: J. Alexandre Dit Sandretto and A. Chapoutot. Validated explicit and implicit Runge–Kutta methods. Reliable Computing (electronic edition), 22:79–113, 2016.
[3]: P. Bürgisser, M. Clausen, and M. A. Shokrollahi. Algebraic Complexity Theory, volume 315 of Grundlehren der Mathematischen Wissenschaften. Springer-Verlag, 1997.
[4]: J. C. Butcher. Numerical methods for ordinary differential equations. Wiley, Chichester, third edition, 2016.
[5]: A. Bychkov, O. Issan, G. Pogudin, and B. Kramer. Exact and optimal quadratization of nonlinear finite-dimensional nonautonomous dynamical systems. SIAM J. Appl. Dyn. Syst., 23(1):982–1016, 2024.
[6]: Y. Cai and G. Pogudin. Dissipative quadratizations of polynomial ODE systems. In B. Finkbeiner and L. Kovács, editors, Tools and Algorithms for the Construction and Analysis of Systems. TACAS 2024, volume 14571 of Lect. Notes Comput. Sci., pages 323–342. Springer, Cham, 2024.
[7]: J. von zur Gathen and J. Gerhard. Modern computer algebra. Cambridge University Press, New York, 3rd edition, 2013.
[8]: M. Hemery, F. Fages, and S. Soliman. On the complexity of quadratization for polynomial differential equations. In A. Abate, T. Petrov, and V. Wolf, editors, Computational Methods in Systems Biology. CMSB 2020, volume 12314 of Lect. Notes Comput. Sci., pages 120–140. Springer, Cham, 2020.
[9]: M. Hemery, F. Fages, and S. Soliman. Compiling elementary mathematical functions into finite chemical reaction networks via a polynomialization algorithm for ODEs. In E. Cinquemani and L. Paulevé, editors, Computational Methods in Systems Biology. CMBS 2021, volume 12881 of Lect. Notes Comput. Sci., pages 74–90. Springer, Cham, 2021.
[10]: J. van der Hoeven. Relax, but don't be too lazy. J. Symbolic Comput., 34:479–542, 2002.
[11]: J. van der Hoeven. The Jolly Writer. Your Guide to GNU TeXmacs. Scypress, 2020.
[12]: J. van der Hoeven and G. Lecerf. Towards a library for straight-line programs. Technical Report, HAL, 2025. https://hal.science/hal-05075591. Accepted for publication in Appl. Algebra Eng. Commun. Comput.
[13]: J. van der Hoeven, G. Lecerf, and A. Minondo. Static bounds for straight-line programs. Technical Report, HAL, 2025. https://hal.science/hal-05105518.
[14]: J. van der Hoeven et al. GNU TeXmacs. https://www.texmacs.org, 1998.
[15]: B. Kramer and G. Pogudin. Discovering Polynomial and Quadratic Structure in Nonlinear Ordinary Differential Equations. Technical Report 2502.10005, arXiv, 2025. https://arxiv.org/abs/2502.10005.
[16]: R. E. Moore, R. B. Kearfott, and M. J. Cloud. Introduction to Interval Analysis. SIAM Press, 2009.
[17]: C. H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.

1.Introduction

1.1.Motivation

1.2.Related work

1.3.Our contributions

2.Straight-line programs

2.1.Definition

2.2.Standard signatures

2.3.Postponing divisions

2.4.Tangent numbers

3.Polynomialization

3.1.SLPs with divisions

3.2.SLPs with special functions

3.3.Exploiting FMA instructions

4.Implementation

5.Conclusion

Bibliography