<\body> ||<\author-address> de Mathématiques ( 425) Université Paris-Sud 91405 Orsay Cedex France Email: >|||> <\abstract> A real number is said to be effective if there exists an algorithm which, given a required tolerance \\*2>>, returns a binary approximation \\*2>> for with -x\|\\>. Effective real numbers are interesting in areas of numerical analysis where numerical instability is a major problem. One key problem with effective real numbers is to perform intermediate computations at the smallest precision which is sufficient to guarantee an exact end-result. In this paper we first review two classical techniques to achieve this: error estimates and interval analysis. We next present two new techniques: ``relaxed evaluations'' reduce the amount of re-evaluations at larger precisions and ``balanced error estimates'' automatically provide good tolerances for intermediate computations. ||>>>>>>>>>>>A is a number of the form > with \>. We denote by =\*2>> the set of dyadic numbers and >={x\\:x\0}>. Given \> and \\>>, an >-approximation for is a number \\> with -x\|\\>. An for \> is an algorithm which takes a tolerance \\> on input and which returns an >-approximation for . A real number \> is said to be , if it admits an approximation algorithm. The aim of this paper is to review several recent techniques for computations with effective real numbers and to present a few new ones. Effective real numbers can be useful in areas of numerical analysis where numerical instability is a major problem, like singularity theory. In such areas, multiple precision arithmetic is often necessary and the required precisions for auxiliary computations may heavily vary. We hope that the development of an adequate computational theory for effective real numbers will both allow us to automatically perform the error analysis and compute the required precisions at which computations have to take place. We recall that there exists no general zero-test for effective real numbers. Nevertheless, exact or heuristically reliable zero-tests do exist for interesting subfields, which contain transcendental constants. However, this topic will not be studied in this paper and we refer to for some recent work on this matter. In an object-oriented language like , a natural way to represent an effective real number is by an abstract object with a method which corresponds to the approximation algorithm. When using this representation, which will be discussed in more detail in section , it is natural to perform error estimations for common operations on real numbers. In other words, if we want to compute ,\,x)> with precision >, then we determine tolerances ,\,\> such that the multiple precision evaluation of at any >-approximations for the > always yields an >-approximation for . In some cases, error estimates are quite pessimistic. An alternative technique for computing with effective real numbers is interval arithmetic. In this case, we approximate an effective real number by an interval ,]> which contains . Then the evaluation of ,\,x)> comes down to the determination of an interval ,]> with <\equation*> f([>,>],\,[>,>])\[,]. For continuous functions , when starting with a very precise approximation of ( -> is very small), the obtained error estimate for will also become as small as desired. In section , this technique of error estimates will be reviewed in more detail, as well as an efficient representation for high-precision intervals. Unfortunately, in some cases, both and error estimates are quite pessimistic. In sections and , we will therefore present two new techniques for the computation of ``adaptive error estimates''. These techniques combine the advantages of priori> and error estimates, while eliminating their major disadvantages. Moreover, this new technique remains reasonably easy to implement in a general purpose system for computations with effective real numbers. Under certain restrictions, the new techniques are also close to being optimal. This point will be discussed in section . The approaches presented in sections and have been proposed, often independently, by several authors and we notice the existence of several similarities with the theory of effective power series . In the past, we experimentally implemented the approaches of sections and in the case of power series (not officially distributed) real numbers . We are currently working on a more robust C++ library based on the ideas in this paper . Currently, this library contains the basic arithmetic operations. Some other implementations with similar objectives are currently known to the author . All these implementations are free software and they are mainly based on the and libraries . error estimates> A natural way to represent an effective real number is by its approximation algorithm. Conceptually speaking, this means that we view as a black box which can be asked for approximations up to any desired precision: <\equation*> \\|||||>>>>>\\x In an object oriented language like C++, this can be implemented using an abstract base class with a virtual method for the approximation algorithm. Effective real numbers will then be pointers to . For instance, <\code> class real_rep { public: \ \ virtual dyadic approx (const dyadic& tol) = 0; }; typedef real_rep* real; Here stands for the class of dyadic numbers. In practice, these may be taken to be arbitrary precision floating point numbers. For simplicity, we also do not care about memory management. In a real implementation, one would need a mechanism for reference counting or conservative garbage collection. Now assume that we want to evaluate ,\,x)> for a given tolerance \\>>, where ,\,x> are effective real numbers and is an operation like >, >> or . In order to make , we need to construct an approximation algorithm for as a function of ,\,x>. This both involves the dyadic approximation of the evaluation of in dyadic approximations of the > and the determination of tolerances ,\,\> for the dyadic approximations of the >. More precisely, ,\,\> should be such that for any ,\,\\> with -x\|\\(i=1,\,r)>, we have <\equation*> \|>(,\,)-f(x,\,x)\|\\, where >> stands for a dyadic approximation algorithm for which depends on the tolerance >. For instance, in the case when is the addition, we may use exact arithmetic on > (so that >=f> for all >) and take =\=\/2>. This yields the following class for representing sums of real numbers: <\code> class add_real_rep: public real_rep { \ \ real x, y; public: \ \ add_real_rep (const real& x2, const real& y2): \ \ \ \ x (x2), y (y2) {} \ \ dyadic approx (const dyadic& tol) { \ \ \ \ return x-\approx (tol \\ 1) + y-\approx (tol \\ 1); } }; The addition can now be implemented as follows: <\code> inline real operator + (const real& x, const real& y) { \ \ return new add_real_rep (x, y); } Notice that, in a sense, we have really represented the sum of and by the expression y> (more generally, such expressions are dags (directed acyclic graphs)). Nevertheless, the representation using an abstract class provides additional flexibility. For instance, we may attach additional information to the class , like the best currently known approximation for the number (thereby avoiding unnecessary recomputations). In practice, it is also good to provide an additional abstract method for computing a rough upper bound for the number. This gives a fine-grained control over potential cancellations. The above approach heavily relies on the computation of error estimates ( the computation of the >). If no additional techniques are used, then this leads to the following disadvantages: <\description> We do not take advantage of the fact that the numeric evaluation of >(,\,)> may lead to an approximation of which is far better than the required tolerance >. Indeed, multiple precision computations are usually done with a precision which is a multiple of the number of bits in a machine word. ``On average'', we therefore gain something like bits of precision. In section , we will show that it may actually be profitable to systematically compute more than necessary. The error estimates may be pessimistic, due to badly balanced expressions. For instance consider the >-approximation of a sum +(x+(x+\+(x)))>, which corresponds to a tree <\equation*> |||x>>> Then the above technique would lead to the computation of an /2)>-approximation of > for r> and an /2)>-approximation of >. If is large, then /2> is unnecessarily small, since a mere /r)>-approximation for each > would do. In section we will consider a general technique for computing ``balanced error estimates''. Badly balanced expressions naturally occur when evaluating polynomials using Horner's rule. error estimates> An alternative technique for computing with effective real numbers is interval arithmetic. The idea is to systematically compute intervals approximations instead of floating point approximations. These intervals must be such that the real numbers we are interested in are certified to lie in their respective interval approximations. More precisely, given \> and \\>>, an >-interval> for is a closed interval ,]> with ,\\>> and -\2*\>. Concretely speaking, we may represent such intervals by their endpoints > and >. Alternatively, if the precisions of > and > are large, then it may be more efficient to represent the interval by its center +)/2> and its radius -)/2>. Indeed, the exact endpoints of the interval are not that important. Hence, modulo a slight increase of the radius, we may always assume that the radius can be stored in a ``single precision dyadic number'' {0,\,2-1}*2>>, where is the number of bits in a machine word. This trick allows to reduce the number of multiple precision computations by a factor of two. Now assume that we want to compute an >-approximation for ,\,x)>, where ,\,x> are effective real numbers and where is a continuous function. Assume also that we have reasonable initial >-intervals for the >. Then, starting with a low precision , we first compute /2)>-intervals >,>]> for the >. We next evaluate using interval arithmetic. This yields an interval ,]> with <\equation*> f([>,>],\,[>,>])\[,]. If -\2*\>, then +)/2> is an >-approximation for . Otherwise, we increase the precisions and repeat the computations. Under relatively mild assumptions on the way we evaluate , this procedure will eventually stop, since is continuous. Although this technique of error estimates does solve the problems and raised in the previous section, it also induces some new problems. Most importantly, we have lost the fine-grained control over the precisions in intermediate computations during the evaluation of ,\,x)>. Indeed, we have only control over the overall starting precision . This disadvantage is reflected in two ways: <\description> It is not clear how to increase . Ideally speaking, should be increased in such a way that the computation time of ,]> is doubled at each iteration. In that case (see section ), the overall computation time is bounded by a constant time the computation time the least precision which leads to an >-computation for . Now the problem is that this ``overall computation time'' of can be estimated well by hand for elementary operations like >, >>, , , but not necessarily for more complicated functions. Consider the case when, somewhere during the evaluation of , we need to compute the sum of a very large number and a very small number . Assume moreover that can be approximated very fast, but that the approximation of requires a lot of time. Since is very small , it will then be possible to skip the computation of , by setting u>, unless is very large. However, the above technique of error estimates does not allow for this optimization. Relaxed evaluation can be used in order to ``combine the good ideas'' of sections and . The idea is to endow with an extra field which corresponds to the best currently known interval approximation of the real number. Moreover, when requesting for a better approximation, we will actually compute a much better approximation, so as to avoid expensive recomputations when we repeatedly ask for slightly better approximations. This anticipating strategy was first introduced in the case of power series, where it leads to important algorithmic gains . In our context, the proposed method is quite different though, because of the problem with carries. More precisely, assume that we have an -ary operation , such that the -digit approximation of ,\,x)> at dyadic numbers with >n> digits has time complexity . The complexity for an elementary operation is usually a well-understood, regular function (in general, is increasing, ). For instance, we have \*n> for addition and \*n>> for multiplication, where \\1> and > decreases slowly from to . Now assume that we have an -digit approximation of ,\,x)> and that we request a slightly better approximation. Then we let \n> be such that )\2*T(n)> and we replace our -digit approximation by an >-digit approximation. This strategy has the property that the successive evaluation of ,\,x)> at ,n> digits requires a time (n)\T(n)+\+T(n)> where \\\n> are such that \n> and )\2*T(n)> for ,k-1>. Consequently <\equation*> T(n)\T(n)+*T(n)+*T(n)+\\2*T(n)\4*T(n)\4*T(n). More generally, the evaluation of ,\,x)> at different precisions ,\,n> requires at most four times as much time as the evaluation of ,\,x)> at precision ,\,n}>. The combination of and error estimates in the relaxed strategy clearly solves problems and . Furthermore, at the level of the class , we have a fine-grained control over how to increase the computational precision. Moreover, we have shown how to perform this increase in an efficient way, as a function of . This will avoid expensive recomputations when occurs many times in a dag. In other words, the relaxed technique also solves problem . Let us illustrate the relaxed strategy on the concrete example of the computation of the sine function. We assume that we have implemented a suitable class for certified arbitrary precision computations with intervals. First of all, now becomes: <\code> class real_rep { protected: \ \ interval best; public: \ \ inline real_rep (): best (interval::fuzzy) {} \ \ virtual interval approx (const dyadic& tol) = 0; \ }; Here stands for the interval \,\)>. Sines of numbers are represented by instances of the following class: <\code> class sin_real_rep: public real_rep { \ \ real x; public: \ \ inline sin_real_rep (const real& x2): x (x2) { \ \ \ \ best= interval (-1, 1); } \ \ interval approx (const dyadic& tol); }; The approximation algorithm is given by <\code> interval sin_real_rep::approx (const dyadic& tol) { \ \ if (tol \ radius (best)) { \ \ \ \ \ interval xa= x-\approx (tol * (1 - DELTA)); \ \ \ \ \ int required_prec= 2 - expo (tol); \ \ \ \ \ int proposed_prec= next_prec (-expo (radius (best))); \ \ \ \ \ xa= truncate (xa, max (required_prec, proposed_prec)); \ \ \ \ \ best= sin (xa); \ \ } \ \ return best; } This algorithm requires some explanations. First of all, stands for a small number like >, where is the machine word size. We use for countering the effect of rounding errors in the subsequent computations. Next, stands for the function which gives the exponent of a dyadic number. We have \x\2> for all \>. The function (see below) applied to a precision computes the precision \n> such that )\2*T(n)>, where stands for the complexity of the evaluation of the sine function. Now we recall that may contain an approximation for which is much better than the approximation we need (this is typically the case if a more precise value for the argument is needed in another part of the global computation). We therefore truncate the precision of at the precision we need, before computing the next approximation for the sine of . The function may for instance be implemented as follows: <\code> int next_prec (int prec) { \ \ if (prec \= (16 * WORD_SIZE)) return 1.41 * prec + WORD_SIZE; \ \ else if (prec \= 256 * WORD_SIZE) return 1.55 * prec; \ \ else if (prec \= 4096 * WORD_SIZE) return 1.71 * prec; \ \ else return 2 * prec; } The different thresholds correspond to the precisions where we use naive multiplication, Karatsuba multiplication and two ranges of F.F.T. multiplication. We finally have the following algorithm for computing sines: <\code> inline real sin (const real& x) { \ \ return new sin_real_rep (x); } <\remark> A current limitation of the above approach is that it only takes into account the local computational complexity at each node and not the global complexity of all induced computations. For instance, in the routine of the computation of , it may happen that the precision > of is only slightly higher than the precision > of . In that case, the next approximation of is only computed at precision =n+O(1)> and not at a precision (like >) which doubles the computation time. When the above problem occurs repeatedly, we thereby lose the principal advantage of the relaxed approach. Nevertheless, it should be noticed that the slight redundancy in the precision > can only occur if this higher precision was requested by computation than (except when has an exponential computational complexity). In fact, the author currently does not know of any natural example of a computation where the above problem occurs systematically (so as to make the computation time increase by more than a constant factor). One remedy is to delay the actual evaluation until the very end and first gradually increase precisions until we are sure that the total re-evaluation cost is about twice as large as the previous evaluation cost (considering only those nodes which are re-evaluated). However, this is only possible if we do not need the actual result of a re-evaluation during the ``prospective phase'' in which we increase the precisions. Yet another approach would be to systematically use the real number analogue of relaxed power series evaluations . However, this requires the re-implementation of all basic operations on floating pointers in a relaxed way. It also involves an additional overhead. One of the main problems with error estimates that we have mentioned in section are badly balanced expressions, which lead to overly pessimistic bounds. In this section we introduce the general technique of ``balanced error estimates'' which solve this problem as well as possible in a general purpose system. The idea behind balanced error estimates is to ``distribute the required tolerance over the subexpressions in a way which is proportional to their weights''. Here leaves have weight and the of an expression ,\,x)> is given by +\+wt x>. <\remark> We recall that expressions are really dags, so the size of an expression may be exponentially smaller than its weight. Indeed, this can be seen by constructing an expression using repeated squarings . It is therefore important to represent the weight by a dyadic number with exponent >W>, where is the size of a machine word. Let us first illustrate the strategy of balanced error estimates in the case of addition. So assume that we want to compute an >-interval for a sum , where has weight as an expression and has weight . Then we will compute a |p+q>>-interval for and a |p+q>>-interval for . For example, in case of the expression , a tolerance of > is distributed as follows: <\equation*> |>>>\|>*\|>*\|>*\|>*\|>*\>|>*\|>*\|>*\|>*\|>*\>>>> Subtraction is treated in a similar way as addition. <\remark> Consider a dag of weight with one root and only additions and subtractions. If a tolerance > is specified at the root, then the above algorithm requires a tolerance *\/w> at each node of weight >. Notice that any correct algorithm should require a tolerance of at most > instead of *\/w>>. In other words, the precision with respect to which we compute never exceeds the absolutely necessary precision by more than . <\remark> Unfortunately, can be linear in the size of the dag, as we saw in the case of repeated squarings (and more generally in the case of loops or recursive functions). Nevertheless, if is exponential in the size, then the precision loss generally only causes the overall complexity to increase by a constant factor. Let us illustrate this point by an example: consider the computation of a -approximation of >, where <\eqnarray*> >||>(f(1))))>>|||>>> Then our algorithm requires the computation of a >-approximation of each >, which takes a time +n*log 3)=O(n)>. The optimal algorithm would only require a >-approximation of each >, which again leads to a global complexity of )>. This example should be compared to the computation of a -approximation of <\equation*> 1+(1+( \>+1))). Our balanced algorithm only requires a time , which is optimal in this case, whereas a non-balanced algorithm would yield a quadratic complexity )>. In the case of multiplication, the above additive distribution of tolerances cannot longer be used and one has to resort to ``relative tolerances''. Here a relative tolerance > for corresponds to an absolute tolerance of =\*\|x\|>. Now let and be effective real numbers of sizes and . Assume for simplicity that we are in the when we have good starting approximations ,\\> for and , say of relative tolerances >2>, where is the machine word size. Then, in order to compute an >-interval for , we first distribute =|\|*\|>*(1-2)> over =|p+q>> and =|p+q>>, and then compute a *\|x\|)>-interval for and a *\|y\|)>-interval for . The product of these approximations yields an >-interval \ for . More generally, assume that ,\,x> are effective real numbers of weights ,\,p> and that we want to compute an >-interval for ,\,x)>, where is a differentiable -ary operation for which we are given an arbitrary precision evaluation algorithm at dyadic intervals. Before balancing tolerances as above, we first have to determine the numbers which have to be subdivided ( > in the case of addition and > (really *\|x\|> and *\|y\|> as we will see below) in the case of multiplication). We define the for ,\,x> to be the minimal numbers ,\,\>, such that for any intervals >,>],\,[>,>]> with <\equation> f([>,>],\,[>,>])\[y-\,y+\], we have >->\\(i=1,\,r)>. Let us for simplicity that we are in the when there exist (computable) bounds <\equation*> 0\|2>\f|\ x>(,\,)\2*B, which are valid for \[x-|B>,x+|B>](i=1,\,r)>. Then () implies in particular that >->\|B>> for ,r>. Conversely, >->\|r*B> (i=1,\,r)> implies (). This proves that |2*r*B>\\\|B>>. For instance, in the case of addition, we may take =B=1> and we obtain \\\1>. In the case of multiplication, we may take =\|\|> and =\|\|> and obtain \|\|\|>> and \|\|\|>>. Now, assuming that we have sharp lower bounds |~>> for the > (like |~>=|2*r*B>>), we compute a |~>*p|p+\+p>>-interval for each >. Then we obtain the requested >-interval for by evaluating at these intervals at a sufficient precision. <\remark> It can be checked by structural induction that the generalized scheme for the distribution of tolerances has the same essential property as in remark : given a dag of weight , the precision with respect to which we compute never exceeds the absolutely necessary precision by more than . The approach of balanced error estimates is more delicate in the irregular case. It is instructive to well understand the case of multiplication first; the other operations can probably be treated in a similar way. So assume that we want to compute a product . Let us first consider the semi-regular case when we still have a good initial approximation for one of the arguments, say , but not for the other. Then we first compute a |\|\|>*(1-2))>-interval for . In case when this yields an approximation > for with a good relative tolerance, then we may proceed as in the regular case. Otherwise, we obtain at least an upper bound for and we let this upper bound play the role of \|>. This leaves us with the purely irregular case when we neither have strictly positive lower bounds for nor . In this case, we have the choice between computing a more precise approximation for or for , but it is not clear which choice is optimal from the complexity point of view. One solution to this problem may be to continuously improve the precisions of the approximations for both and , while distributing the available computation time equally over and . It is not clear yet to us how to estimate the complexity of such an algorithm. Another solution, which is easier to implement, is to use rough upper bounds for and instead of \|> and \|>, while increasing the precision continuously. However, this strategy is certainly not optimal in some cases. \; We have presented two new ``adaptive error estimating techniques'' for computing with effective real numbers. We believe these techniques to be optimal in virtually all situations which occur in practice. It remains interesting to study the theoretical optimality in more detail. This first requires the introduction of a suitable notion of optimality. Assume that we have a library of effective real functions ,\,f>, some of which have arity . Then any computation using this library involves only a finite number ,\,\> of expressions constructed using the >. These expressions actually correspond to a directed acyclic graph, since they may share common subexpressions. Assume that the aim of the computation is to obtain an >-approximation for each > with S> for some {1,\,p}>. A to this problem consists of assigning a dyadic interval to each node of the dag, such that <\itemize> The interval >,>]> associated to each > with S> satisfies >->\2*\>. For each subexpression (\,\,\)> with 0>, the interval associated to > is obtained by evaluating > at the intervals associated to ,\,\>. The certified solution is said to be , if the time needed to compute all interval labels is minimal. Modulo rounding errors, finding such an optimal solution corresponds to determining optimal tolerances for the leaves of the dag (which is equivalent to determining the lengths of the intervals associated to the leaves). <\remark> The above modeling does simplify reality a bit: in practice, some of the expressions ,\,\> may themselves depend on the way we compute things. Also, we may require >-approximations for the > during the intermediate computations, and in an arbitrary order. A more elaborate model would therefore consider a sequence of problems of the above type, with growing dags, growing sets and decreasing >. Clearly, it is difficult to efficiently find optimal certified solutions. In many cases, it should nevertheless be possible to find almost optimal solutions for which the cost of the >-approximations exceeds the optimal cost by at most a constant factor. We have already pointed out that our strategies are non-optimal in general (see remarks and , and the discussion on the irregular case in section ). Nevertheless, in some special cases, we do approach optimality. First of all, if we are computing with expression trees (and not dags), then the problem in remark cannot occur and the precision loss in remark is at most logarithmic in the size of the expression. Secondly, assume that all approximation algorithms for the > have a polynomial time complexity >)>, and consider a computation which essentially spends its time in high precision arithmetic. Then the precision loss in remark only causes a slow-down by a constant factor, since >)=O(n>)>. <\bibliography|bib|elsart-num|effan.bib> D. Richardson, How to recognize zero, J.S.C. 24 (1997) 627--645. J. van der Hoeven, Zero-testing, witness conjectures and differential diophantine approximation, Tech. Rep. 2001-62, Prépublications d'Orsay (2001). J. Blanck, V. Brattka, P. Hertling (Eds.), Computability and complexity in analysis, Vol. 2064 of Lect. Notes in Comp. Sc., Springer, 2001. J. Blanck, General purpose exact real arithmetic, Tech. Rep. CSR 21-200, Luleå University of Technology, Sweden, (2002). M. Grimmer, K. Petras, N. Revol, Multiple precision interval packages: Comparing different approaches, Tech. Rep. RR 2003-32, LIP, École Normale Supérieure de Lyon (2003). J. van der Hoeven, Automatic asymptotics, Ph.D. thesis, École polytechnique, France (1997). J. van der Hoeven, Relax, but don't be too lazy, JSC 34 (2002) 479--542. J. van der Hoeven, GMPX, a C-extension library for gmp, , no longer maintained (1999). J. van der Hoeven, Mmxlib, a C++ core library for Mathemagix, (2003--2004). N. Müller, iRRAM, exact arithmetic in C++, (2000--2004). N. Revol, MPFI, a multiple precision interval arithmetic library, (2001--2004). T. Granlund, Others, GMP, the GNU multiple precision arithmetic library, (1991--2004). G. Hanrot, V. Lefèvre, K. Ryde, P. Zimmermann, MPFR, a c library for multiple-precision floating-point computations with exact rounding, (2000--2004). J. van der Hoeven, Lazy multiplication of formal power series, in: W. W. Küchlin (Ed.), Proc. ISSAC '97, Maui, Hawaii, 1997, pp. 17--20. <\initial> <\collection> <\references> <\collection> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > <\auxiliary> <\collection> <\associate|bib> Rich97 vdH:witness BBH01 Bl02 GPR03 vdH:phd vdH:relax GMPX Mmxlib IRRAM MPFI GMP MPFR vdH:issac97 vdH:relax vdH:relax <\associate|toc> |math-font-series||1.Introduction> |.>>>>|> |math-font-series||2.|A priori> error estimates> |.>>>>|> |math-font-series||3.|A posteriori> error estimates> |.>>>>|> |math-font-series||4.Relaxed evaluation> |.>>>>|> |math-font-series||5.Balanced error estimates> |.>>>>|> |math-font-series||6.Conclusion and notes on optimality> |.>>>>|> |math-font-series||Bibliography> |.>>>>|>