
Current general purpose libraries for multiple precision
floatingpoint arithmetic such as
G.1.0 Computerarithmetic 
Multiple precision arithmetic [3] is crucial in areas such as computer algebra and cryptography, and increasingly useful in mathematical physics and numerical analysis [1]. Early multiple precision libraries appeared in the seventies [2], and nowadays GMP [7] and MPFR [6] are typically very efficient for large precisions of more than, say, 1000 bits. However, for precisions that are only a few times larger than the machine precision, these libraries suffer from a large overhead. For instance, the MPFR library for arbitrary precision and IEEEstyle standardized floatingpoint arithmetic is typically about a factor 100 slower than double precision machine arithmetic.
This overhead of multiple precision libraries tends to further increase
with the advent of wider SIMD (Single Instruction, Multiple
Data) arithmetic in modern processors, such as
In order to make multiple precision arithmetic more useful in areas such as numerical analysis, it is a major challenge to reduce the overhead of multiple precision arithmetic for small multiples of the machine precision, and to build libraries with direct SIMD arithmetic for multiple precision floatingpoint numbers.
One existing approach is based on the “TwoSum” and “TwoProduct” operations [4, 13] that allow for the exact computation of sums and products of two machine floatingpoint numbers. The results of these operations are represented as sums where and have no “overlapping bits” (e.g. or ). The TwoProduct operation can be implemented using only two instructions when hardware offers the fusedmultiplyadd (FMA) and fusedmultiplysubtract (FMS) instructions, as is for instance the case for AVX2 enabled processors. The TwoSum operation could be done using only two instructions as well if we had similar fusedaddadd and fusedaddsubtract instructions. Unfortunately, this is not the case for current hardware.
It is well known that double machine precision arithmetic can be implemented reasonably efficiently in terms of the TwoSum and TwoProduct algorithms [4, 13, 15]. The approach has been further extended in [17, 14] to higher precisions. Specific algorithms are also described in [12] for tripledouble precision, and in [9] for quadrupledouble precision. But these approaches tend to become inefficient for large precisions.
An alternative approach is to represent floatingpoint numbers by
products , where is a
fixedpoint mantissa, an exponent, and the base. This representation is used in most of the
existing multiprecision libraries such as
Our paper is structured as follows. In section 2, we detail the representation of fixedpoint numbers and basic arithmetic operations. We follow a similar approach as in [10], but slightly adapt the representation and corresponding algorithms to allow for larger bit precisions of the mantissa. As in [10], we rely on standard IEEE754 compliant floatingpoint arithmetic that is supported by most recent processors and GPUs. For processors with efficient SIMD integer arithmetic, it should be reasonably easy to adapt our algorithms to this kind of arithmetic. Let be the bit precision of our machine floatingpoint numbers minus one (so that for IEEE754 double precision numbers). Throughout this paper, we represent fixedpoint numbers in base by tuplets of machine floatingpoint numbers, where is slightly smaller than and .
The main bottleneck for the implementation of floatingpoint arithmetic on top of fixedpoint arithmetic is shifting. This operation is particularly crucial for addition, since every addition requires three shifts. Section 3 is devoted to this topic and we will show how to implement reasonably efficient shifting algorithms for SIMD vectors of fixedpoint numbers. More precisely, small shifts (of less than bits) can be done in parallel using approximately operations, whereas arbitrary shifts require approximately operations.
In section 4, we show how to implement arbitrary precision floatingpoint arithmetic in base . Our approach is fairly standard. On the one hand, we use the “left shifting” procedures from section 3 in order to normalize floatingpoint numbers (so that mantissas of non zero numbers are always “sufficiently large” in absolute value). On the other hand, the “right shifting” procedures are used to work with respect to a common exponent in the cases of addition and subtraction. We also discuss a first strategy to reduce the cost of shifting and summarize the corresponding operation counts in Table 2. In section 5, we perform a similar analysis for arithmetic in base . This leads to slightly less compact representations, but shifting is reduced to multiple word shifting in this setting. The resulting operation counts can be found in Table 3.
The operation counts in Tables 2 and 3 really represent the worst case scenario in which our implementations for basic arithmetic operations are required to be “black boxes”. Multiple precision arithmetic can be made far more efficient if we allow ourselves to open up these boxes when needed. For instance, any number of floatingpointing numbers can be added using a single function call by generalizing the addition algorithms from sections 4 and 5 to take more than two arguments; this can become almost thrice as efficient as the repeated use of ordinary additions. A similar approach can be applied to entire algorithms such as the FFT [10]: we first shift the inputs so that they all admit the same exponent and then use a fixedpoint algorithm for computing the FFT. We intend to come back to this type of optimizations in a forthcoming paper.
So far, we only checked the correctness of our algorithms using a
prototype implementation. Our operation count analysis indicates that
our approach should outperform others as soon as
and maybe even for and .
Another direction of future investigations concerns correct rounding and
full compliance with the IEEE standard, taking example on
Throughout this paper, we assume IEEE arithmetic with correct rounding and we denote by the set of machine floatingpoint numbers. We let be the machine precision minus one (which corresponds to the number of fractional bits of the mantissa) and let and be the minimal and maximal exponents of machine floatingpoint numbers. For IEEE double precision numbers, this means that , and .
In this paper, and contrary to [10], the rounding mode is always assume to be “round to nearest”. Given and , we denote by the rounding of to the nearest. For convenience of the reader, we denote whenever the result is provably exact in the given context. If is the exponent of and (i.e. in absence of overflow and underflow), then we notice that . For efficiency reasons, the algorithms in this paper do not attempt to check for underflows, overflows, and other exceptional cases.
Modern processors usually support fusedmultiplyadd (FMA) and fusedmultiplysubtract (FMS) instructions, both for scalar and SIMD vector operands. Throughout this paper, we assume that these instructions are indeed present, and we denote by and the roundings of and to the nearest.
Acknowledgment. We are very grateful to the third referee for various suggestions and for drawing our attention to several more or less serious errors in an earlier version of this paper.
Let and . In this section, we start with a survey of efficient fixedpoint arithmetic at bit precision . We recall the approach from [10], but use a slightly different representation in order to allow for high precisions . We adapted the algorithms from [10] accordingly and explicitly state the adapted versions. Allowing to be smaller than corresponds to using a redundant number representation that makes it possible to efficiently handle carries during intermediate computations. We denote by the number of extra carry bits (these bits are sometimes called “nails”, following GMP [7]). We refer to section 3.5 for a table that recapitulates the operation counts for the algorithms in this section.
Given and an integer , we denote by the set of numbers of the form
where are such that
We write for numbers of the above form and abbreviate . Numbers in are said to be in carry normal form.
Figure 1. Schematic representation of a fixedpoint number , where . 
Remark
Remark
An important subalgorithm for efficient fixedpoint arithmetic computes the truncation of a floatingpoint number at a given exponent:
Algorithm Split_{}() 
return 
Proposition 1 from [10] becomes as follows for “rounding to nearest”:
Numbers can be rewritten in carry normal form using carry propagition. This can be achieved as follows (of course, the loop being unrolled in practice):
Algorithm CarryNormalize() 
for from down to do
return 
A straightforward adaptation of [10, Proposition 2] yields:
Non normalized addition and subtraction of fixedpoint numbers is straightforward:



Proposition 9 from [10] now becomes:
The multiplication algorithm of fixedpoint numbers is based on a subalgorithm LongMul_{} that computes the exact product of two numbers in the form of a sum , with the additional constraint that . Without this additional constraint (and in absence of overflow and underflow), and can be computed using the well known “Two Product” algorithm: , . The LongMul_{} algorithm exploits the FMA and FMS instructions in a similar way.


Proposition 4 from [10] becomes as follows for “rounding to nearest”:
For large enough as a function of , one may use the following algorithm for multiplication (all loops again being unrolled in practice):
Algorithm Multiply() 
for from to do
for from to do
for from to do
return 
Notice that we need the additional line at the end with respect to [10] in order to ensure that . Adapting [10, Proposition 10], we obtain:
In this section, we discuss several routines for shifting fixedpoint numbers. All algorithms were designed to be naturally parallel. More precisely, we logically operate on SIMD vectors of fixedpoint numbers . Internally, we rather work with fixed point numbers whose “coefficients” are machine SIMD vectors . All arithmetic operations can then simply be performed in SIMD fashion using hardware instructions. For compatibility with the notations from the previous section, we omit the bold face for SIMD vectors, except if we want to stress their SIMD nature. For actual implementations for a given , we also understand that we systematically use inplace versions of our algorithms and that all loops and function calls are completely expanded.
Let be a fixedpoint number. Left shifts and right shifts of with can be computed by cutting the shifted coefficients resp. using the routine and reassembling the pieces. The shift routines behave very much like generalizations of the routine for carry normalization.



Proof. In the main loop, we observe that and . The assumption guarantees that Proposition 3 applies, so that and . The operation is therefore exact, so that , , and . Since , we also have . Now if and if . Combined with the facts that and , it follows that the operation is also exact. Moreover, if and if . The last operation is clearly exact and . Finally,
For shifts by bits with ,
we may directly shift the coefficients of the
operand. Let be the binary representation of
with . Then we decompose
a shift by bits as the composition of shifts by either or bits for , depending on whether
or . This way of
proceeding has the advantage of being straightforward to parallelize,
assuming that we have an instruction to extract a new SIMD vector from
two given SIMD vectors according to a mask. On



The following propositions are straightforward to prove.
Combining with the algorithms from the previous subsection, we obtain the routines ShiftLeft and ShiftRight below for general shifts. Notice that we shift by at most bits. Due to the fact that we allow for nail bits, the maximal error is bounded by .



The routines LargeShiftLeft and LargeShiftRight were designed to work for SIMD vectors of fixedpoint numbers and shift amounts . If the number of bits by which we shift is the same for all entries, then we may use the following routines instead:



The IEEE754 standard provides an operation for retrieving the exponent of a machine number : if , then . It is natural to ask for a similar function on , as well as a similar function in base (that returns the exponent with for every with ). For , we understand that .
The functions LogB and LogW below are approximations for such functions and . More precisely, for all in carry normal form with , we have
The routine LogB relies on the computation of a number with that only works under the assumption that . It is nevertheless easy to adapt the routine to higher precisions by cutting into chunks of machine numbers. The routine LogW relies on a routine HiWord that determines the smallest index with .







Proof. In Compress, let be the value of after adding and notice that for all . If for all , then and Compress returns an exact result. Otherwise, let be minimal such that . Then and , whence and . Moreover, the exponent of is at least . For , we have , whence the exponent of is at most . This means that , whence and . The bound (2) directly follows.
In Table 1 below, we have shown the number of machine operations that are required for the fixedpoint operations from this and the previous section, as a function of . Of course, the actual time complexity of the algorithms also depends on the latency and throughput of machine instructions. We count an assignment as a single instruction.
Let , and be as in section 2. We will represent floatingpoint numbers as products
where is the mantissa of and its exponent in base . We denote by the set of all such floatingpoint numbers. We assume that the exponents in can be represented exactly, by machine integers or numbers in . As in most existing multiple precision libraries, we use an extended exponent range. For multiple precision arithmetic, it is indeed natural to support exponents of at least as many bits as the precision itself.
In this section, we assume that . The main
advantage of this base is that floatingpoint numbers can be represented
in a compact way. The main disadvantage is that normalization involves
expensive shifts by general numbers of bits that are not necessarily
multiples of . Notice that
is also used in the
Given , we denote . Numbers in are again said to be in carry normal form and the routine CarryNormalize extends to by applying it to the mantissa. We say that a floatingpoint number with is in dot normal form if we either have or . If is also in carry normal form, then we simply say that is in normal form. In absence of overflows, normalizations can be computed using the routine Normalize below. In the case when the number to be normalized is the product of two normalized floatingpoint numbers, we need to shift the mantissa by at most two bits, and it is faster to use the variant QuickNormalize.



Addition and subtraction of normalized floatingpoint numbers are computed as usual, by putting the mantissa under a common exponent, by adding or subtracting the shifted mantissas, and by normalizing the result. By increasing the common exponent by two, we also make sure that the most significant word of the addition or subtraction never provokes a carry.



Floatingpoint multiplication is almost as efficient as its fixedpoint analogue, using the following straightforward:


Remark
If we are computing with an SIMD vector of floatingpoint numbers, then another way to speed up shifting is to let all entries share the same exponent . Of course, this strategy may compromise the accuracy of some of the computations. Nevertheless, if all entries are of similar order of magnitude, then the loss of accuracy is usually limited to a few bits. Moreover, by counting the number of leading zeros of the mantissa of a particular entry , it is possible to monitor the loss of accuracy, and redo some of the computations if necessary.
When sharing exponents, it becomes possible to use UniformShiftLeft and UniformShiftRight instead of LargeShiftLeft and LargeShiftRight for multiple word shifts. The routine should also be adjusted: if the individual exponents given by the vector , then should now return the vector with . Modulo these changes, we may use the same routines as above for basic floatingpoint arithmetic.
In Table 2, we give the operation counts for the various variants of the basic arithmetic operations , and that we have discussed so far. For comparison, we have also shown the operation count for the algorithm from [14] that is based on floatingpoint expansions (notice that is somewhat better for this algorithm, since our algorithms rather use ).
Due to the cost of shifting, we observe that additions and subtractions are the most costly operations for small and medium precisions . For larger precisions , multiplication becomes the main bottleneck.


Table 2. Operation counts for arithmetic floatingpoint operations in base . 
As we can see in Table 2, additions and subtractions are
quite expensive when working in base . This is
due to the facts that shifting is expensive with respect to this base
and that every addition involves three shifts. For this reason, we will
now examine floatingpoint arithmetic in the alternative base . One major disadvantage of this base is that normalized
non zero floatingpoint numbers may start with as many as leading zero bits. This means that
should be increased by one in order to achieve a similar accuracy. We
notice that the base is also used in the native
mpf layer for floatingpoint arithmetic in the
Carry normal forms are defined in the same way as in base . We say that a floatingpoint number with is in dot normal form if and either or . We say that is in normal form if it is both in carry normal form and in dot normal form.
In section 4.2, we increased the common exponent of the summands of an addition by two in order to avoid carries. When working in base , a similar trick would systematically shift out the least significant words of the summands. Instead, it is better to allow for carries, but to adjust the routine for normalization accordingly, by temporarily working with mantissas of words instead of .

Modulo this change, the routines for addition and subtract are now as follows:



The dot normalization of a product becomes very particular when working in base since this can always be accomplished using a large shift by either or or bits. Let be the variant of obtained by replacing the condition by . In order to achieve an accuracy of about bits at least, we extend the mantissas by one machine coefficient before multiplying them. The routine QuickNormalize suppresses the extra entry.



In Table 2, we give the operation counts for floatingpoint addition, subtraction and multiplication in base . In a similar way as in section 4.3, it is possible to share exponents, and the table includes the operation counts for this strategy. This time, additions and subtractions are always cheaper than multiplications.


Table 3. Operation counts for arithmetic floatingpoint operations in base . 
[1] D. H. Bailey, R. Barrio, and J. M. Borwein. High precision computation: mathematical physics and dynamics. Appl. Math. Comput., 218:10106–10121, 2012.
[2] R. P. Brent. A Fortran multipleprecision arithmetic package. ACM Trans. Math. Software, 4:57–70, 1978.
[3] R. P. Brent and P. Zimmermann. Modern Computer Arithmetic. Cambridge University Press, 2010.
[4] T. J. Dekker. A floatingpoint technique for extending the available precision. Numer. Math., 18(3):224–242, 1971.
[5] N. Emmart, J. Luitjens, C. C. Weems, and C. Woolley. Optimizing modular multiplication for nvidia's maxwell gpus. In Paolo Montuschi, Michael Schulte, Javier Hormigo, Stuart Oberman, and Nathalie Revol, editors, 23nd IEEE Symposium on Computer Arithmetic, ARITH 2016, Silicon Valley, CA, USA, July 1013, 2016, pages 47–54. IEEE, 2016.
[6] L. Fousse, G. Hanrot, V. Lefèvre, P. Pélissier, and P. Zimmermann. MPFR: a multipleprecision binary floatingpoint library with correct rounding. ACM Trans. Math. Software, 33(2), 2007. Software available at http://www.mpfr.org.
[7] T. Granlund et al. GMP, the GNU multiple precision arithmetic library. http://gmplib.org, from 1991.
[8] D. Harvey, J. van der Hoeven, and G. Lecerf. Even faster integer multiplication. Journal of Complexity, 36:1–30, 2016.
[9] Yozo Hida, Xiaoye S. Li, and D. H. Bailey. Algorithms for quaddouble precision floatingpoint arithmetic. In Proc. 15th IEEE Symposium on Computer Arithmetic, pages 155–162. IEEE, 2001.
[10] J. van der Hoeven and G. Lecerf. Faster FFTs in medium precision. In 22nd Symposium on Computer Arithmetic (ARITH), pages 75–82, June 2015.
[11] A. Karatsuba and J. Ofman. Multiplication of multidigit numbers on automata. Soviet Physics Doklady, 7:595–596, 1963.
[12] C. Lauter. Basic building blocks for a tripledouble intermediate format. Technical Report RR200538, LIP, ENS Lyon, 2005.
[13] J.M. Muller, N. Brisebarre, F. de Dinechin, C.P. Jeannerod, V. Lefèvre, G. Melquiond, N. Revol, D. Stehlé, and S. Torres. Handbook of FloatingPoint Arithmetic. Birkhäuser Boston, 2010.
[14] JeanMichel Muller, Valentina Popescu, and Ping Tak Peter Tang. A new multiplication algorithm for extended precision using floatingpoint expansions. In Paolo Montuschi, Michael Schulte, Javier Hormigo, Stuart Oberman, and Nathalie Revol, editors, 23nd IEEE Symposium on Computer Arithmetic, ARITH 2016, Silicon Valley, CA, USA, July 1013, 2016, pages 39–46, 2016.
[15] T. Nagai, H. Yoshida, H. Kuroda, and Y. Kanada. Fast quadruple precision arithmetic library on parallel computer SR11000/J2. In Computational Science  ICCS 2008, 8th International Conference, Kraków, Poland, June 2325, 2008, Proceedings, Part I, pages 446–455, 2008.
[16] J.M. Pollard. The fast Fourier transform in a finite field. Mathematics of Computation, 25(114):365–374, 1971.
[17] D. M. Priest. Algorithms for arbitrary precision floatingpoint arithmetic. In Proc. 10th Symposium on Computer Arithmetic, pages 132–145. IEEE, 1991.
[18] A. Schönhage and V. Strassen. Schnelle Multiplikation großer Zahlen. Computing, 7:281–292, 1971.
[19] D. Takahashi. Implementation of multipleprecision floatingpoint arithmetic on Intel Xeon Phi coprocessors. In Computational Science and Its Applications – ICCSA 2016: 16th International Conference, Beijing, China, July 47, 2016, Proceedings, Part II, pages 60–70, Cham, 2016. Springer International Publishing.