> <\body> <\hide-preamble> >> >> > <\doc-data||>|> \; |<\author-affiliation> Laboratoire d'informatique, UMR 7161 CNRS Campus de l'École polytechnique 1, rue Honoré d'Estienne d'Orves Bâtiment Alan Turing, CS35003 91120 Palaiseau |>>| \; >> Current general purpose libraries for multiple precision floating-point arithmetic such as suffer from a large performance penalty with respect to hard-wired instructions. The performance gap tends to become even larger with the advent of wider SIMD arithmetic in both CPUs and GPUs. In this paper, we present efficient algorithms for multiple precision floating-point arithmetic that are suitable for implementations on SIMD processors. |<\abstract-acm> G.1.0 Computer-arithmetic ||> Multiple precision arithmetic is crucial in areas such as computer algebra and cryptography, and increasingly useful in mathematical physics and numerical analysis. Early multiple precision libraries appeared in the seventies, and nowadays GMP and MPFR are typically very efficient for large precisions of more than, say, 1000 bits. However, for precisions that are only a few times larger than the machine precision, these libraries suffer from a large overhead. For instance, the MPFR library for arbitrary precision and IEEE-style standardized floating-point arithmetic is typically about a factor 100 slower than double precision machine arithmetic. This overhead of multiple precision libraries tends to further increase with the advent of wider SIMD () arithmetic in modern processors, such as 's AVX technology. Indeed, it is hard to take advantage of wide SIMD instructions when implementing basic arithmetic for individual numbers of only a few words. In order to fully exploit SIMD instructions, one should rather operate on suitably represented SIMD vectors of multiple precision numbers. Asecond problem with current SIMD arithmetic is that CPU vendors tend to favor wide floating-point arithmetic over wide integer arithmetic, whereas faster integer arithmetic is most useful for speeding up multiple precision libraries. In order to make multiple precision arithmetic more useful in areas such as numerical analysis, it is a major challenge to reduce the overhead of multiple precision arithmetic for small multiples of the machine precision, and to build libraries with direct SIMD arithmetic for multiple precision floating-point numbers. One existing approach is based on the \P\Q and \P\Q operations that allow for the exact computation of sums and products of two machine floating-point numbers. The results of these operations are represented as sums where and have no \Poverlapping bits\Q ( |log |\>\|log |\>+53> or ). The operation can be implemented using only two instructions when hardware offers the (FMA) and (FMS) instructions, as is for instance the case for AVX2 enabled processors. The operation could be done using only two instructions as well if we had similar fused-add-add and fused-add-subtract instructions. Unfortunately, this is not the case for current hardware. It is well known that double machine precision arithmetic can be implemented reasonably efficiently in terms of the and algorithms. The approach has been further extended in to higher precisions. Specific algorithms are also described in for triple-double precision, and in for quadruple-double precision. But these approaches tend to become inefficient for large precisions. An alternative approach is to represent floating-point numbers by products >, where is a fixed-point mantissa, an exponent, and 2>> the base. This representation is used in most of the existing multi-precision libraries such as and. However, the authors are only aware of sequential implementations of this approach. In this paper we examine the efficiency of this approach on SIMD processors. As in, we systematically work with vectors of multiple precision numbers rather than with vectors of \Pdigits\Q in base . We refer to for some other recent approaches. Our paper is structured as follows. In section, we detail the representation of fixed-point numbers and basic arithmetic operations. We follow a similar approach as in, but slightly adapt the representation and corresponding algorithms to allow for larger bit precisions of the mantissa. As in, we rely on standard IEEE-754 compliant floating-point arithmetic that is supported by most recent processors and GPUs. For processors with efficient SIMD integer arithmetic, it should be reasonably easy to adapt our algorithms to this kind of arithmetic. Let> be the bit precision of our machine floating-point numbers minus one (so that =52> for IEEE-754 double precision numbers). Throughout this paper, we represent fixed-point numbers in base> by -tuplets of machine floating-point numbers, where is slightly smaller than > and 2>. The main bottleneck for the implementation of floating-point arithmetic on top of fixed-point arithmetic is shifting. This operation is particularly crucial for addition, since every addition requires three shifts. Section is devoted to this topic and we will show how to implement reasonably efficient shifting algorithms for SIMD vectors of fixed-point numbers. More precisely, small shifts (of less than bits) can be done in parallel using approximately operations, whereas arbitrary shifts require approximately k+4|)>*k> operations. In section, we show how to implement arbitrary precision floating-point arithmetic in base . Our approach is fairly standard. On the one hand, we use the \Pleft shifting\Q procedures from section in order to normalize floating-point numbers (so that mantissas of non zero numbers are always \Psufficiently large\Q in absolute value). On the other hand, the \Pright shifting\Q procedures are used to work with respect to a common exponent in the cases of addition and subtraction. We also discuss a first strategy to reduce the cost of shifting and summarize the corresponding operation counts in Table. In section, we perform a similar analysis for arithmetic in base >. This leads to slightly less compact representations, but shifting is reduced to multiple word shifting in this setting. The resulting operation counts can be found in Table. The operation counts in Tables and really represent the worst case scenario in which our implementations for basic arithmetic operations are required to be \Pblack boxes\Q. Multiple precision arithmetic can be made far more efficient if we allow ourselves to open up these boxes when needed. For instance, any number of floating-pointing numbers can be added using asingle function call by generalizing the addition algorithms from sections and to take more than two arguments; this can become almost thrice as efficient as the repeated use of ordinary additions. Asimilar approach can be applied to entire algorithms such as the FFT: we first shift the inputs so that they all admit the same exponent and then use a fixed-point algorithm for computing the FFT. We intend to come back to this type of optimizations in aforthcomingpaper. So far, we only checked the correctness of our algorithms using a prototype implementation. Our operation count analysis indicates that our approach should outperform others as soon as 5> and maybe even for and . Another direction of future investigations concerns correct rounding and full compliance with the IEEE standard, taking example on . Throughout this paper, we assume IEEE arithmetic with correct rounding and we denote by> the set of machine floating-point numbers. We let \8> be the machine precision minus one (which corresponds to the number of fractional bits of the mantissa) and let > and > be the minimal and maximal exponents of machine floating-point numbers. For IEEE double precision numbers, this means that =52>, =-1022> and =1023>. In this paper, and contrary to, the rounding mode is always assume to be \Pround to nearest\Q. Given \> and >\|}>>>, we denote by y|)>> the rounding of y> to the nearest. For convenience of the reader, we denote y|)>=y|)>> whenever the result y|)>=x\y> is provably exact in the given context. If is the exponent of y> and \e\E+\> ( in absence of overflow and underflow), then we notice that y|)>-x\y|\|>\2-1>>. For efficiency reasons, the algorithms in this paper do not attempt to check for underflows, overflows, and other exceptional cases. Modern processors usually support fused-multiply-add (FMA) and fused-multiply-subtract (FMS) instructions, both for scalar and SIMD vector operands. Throughout this paper, we assume that these instructions are indeed present, and we denote by > and > the roundings of and to the nearest. <\render-remark|Acknowledgment> We are very grateful to the third referee for various suggestions and for drawing our attention to several more or less serious errors in an earlier version of this paper. Let ,\-2|}>> and 2>. In this section, we start with a survey of efficient fixed-point arithmetic at bit precision. We recall the approach from, but use a slightly different representation in order to allow for high precisions 19>. We adapted the algorithms from accordingly and explicitly state the adapted versions. Allowing to be smaller than > corresponds to using a redundant number representation that makes it possible to efficiently handle carries during intermediate computations. We denote by =\-p\2>> the number of extra carry bits (these bits are sometimes called \Pnails\Q, following GMP). We refer to section for atable that recapitulates the operation counts for the algorithms in this section. Given \C\2>> and an integer 2>, we denote by > the set of numbers of the form <\eqnarray*> ||+x*2+\+x*2*p>,>>>> where ,\,x\\> are such that <\equation*> |||>|>|*2>|>|0\i\k>>||\|>>|>|>>||>||\|>>|>|||0\i\k.>>>>> We write ,\,x|]>> for numbers of the above form and abbreviate =\>>>. Numbers in >> are said to be in . <\big-figure||gr-frame|>|gr-geometry||gr-transformation||||>|gr-text-at-halign|center|gr-text-at-valign|center|gr-auto-crop|true|||||>>||>>|*2|>>||>>|>>|||||>>|>>||>>||>>|>>||>>||>>||>>||>>||||>>>> Schematic representation of a fixed-point number ,x|]>=x+x*2>, where =|log C|\>>. <\remark> The paper rather uses the representation +\+x> with \\*2*p>> and |\|>\C*2>. This representation is slightly more efficient for small , since it allows one to save one operation in the implementation of the multiplication algorithm. However, it is limited to small values of (typically 19>), since *p> must required to be smaller than-\>. The representation() also makes it easier to implement efficient multiplication algorithms at high precisions , such as Karatsuba's algorithm or FFT-based methods. We intend to return to this issue in a forthcoming paper. <\remark> Another minor change with respect to is that we also require \\*2> to hold for the last index . In order to meet this additional requirement, we need two additional instructions at the end of the multiplication routine in section below. An important subalgorithm for efficient fixed-point arithmetic computes the truncation of afloating-point number at a given exponent: <\named-algorithm|Split>()> \2>|)>> \2>|)>> Proposition1 from becomes as follows for \Prounding to nearest\Q: <\proposition> Given \> and ,\,E-\|}>> such that \2-2>>, the algorithm > computes a number \\> with \\*2> and -x|\|>\2>. Numbers can be rewritten in carry normal form using carry propagition. This can be achieved as follows (of course, the loop being unrolled in practice): <\named-algorithm|CarryNormalize()> \x> <\indent> \|)>> \-c|)>> \+c*2|)>> \r> ,\,|]>> A straightforward adaptation of 2> yields: <\proposition> Given a fixed-point number \> with |\|>\2>-2-p>> and 2>-2-p>>, the algorithm > returns \\+2>> with =x>. Non normalized addition and subtraction of fixed-point numbers is straightforward: <\padded> >| <\named-algorithm|Add>> +y|)>,\,+y|)>|]>> |<\cell> \; |<\cell> <\named-algorithm|Subtract>> -y|)>,\,-y|)>|]>> >>>> Proposition9 from now becomes: <\proposition> Let \> and \> with C+D+2\2>>. If +y|\|>\2>>, then > satisfies \>. If -y|\|>\2>>, then >> satisfies \>. The multiplication algorithm of fixed-point numbers is based on a subalgorithm > that computes the exact product of two numbers \> in the form of a sum >, with the additional constraint that \*2>. Without this additional constraint (and in absence of overflow and underflow), and > can be computed using the well known \PTwo Product\Q algorithm: >, \>. The > algorithm exploits the FMA and FMS instructions in a similar way. <\padded> >| <\named-algorithm|LongMul>>> \2>|)>> \2>|)>> \> |)>> |<\cell> \; >>>> Proposition4 from becomes as follows for \Prounding to nearest\Q: <\proposition> Let \> and +\,\,E-\|}>> be such that \2+e-2>> and \*2>>. Then the algorithm > computes a pair |)>\\> with \*2>, =x*y>, and |\|>\2>. For large enough as a function of , one may use the following algorithm for multiplication (all loops again being unrolled in practice): <\named-algorithm|Multiply()> ,r|)>\,y|)>> <\indent> |)>\,y|)>> \*2+h|)>> <\indent> |)>\,y|)>> \+h|)>> \+\|)>> \*2|)>> <\indent> \*y+r|)>> \|)>> ,\,r|]>> Notice that we need the additional line \|)>> at the end with respect to in order to ensure that \\*2>. Adapting 10>, we obtain: <\proposition> Let \> and \> with |\|>\C>, |\|>\D>, 2-2>>, and *|)>\2>>. Then \\> with \S*2>. In this section, we discuss several routines for shifting fixed-point numbers. All algorithms were designed to be naturally parallel. More precisely, we logically operate on SIMD vectors =,\,x|)>> of fixed-point numbers =,\,x|]>>. Internally, we rather work with fixed point numbers =,\,\|]>> whose \Pcoefficients\Q are machine SIMD vectors =,\,x|)>>>. All arithmetic operations can then simply be performed in SIMD fashion using hardware instructions. For compatibility with the notations from the previous section, we omit the bold face for SIMD vectors, except if we want to stress their SIMD nature. For actual implementations for agiven , we also understand that we systematically use in-place versions of our algorithms and that all loops and function calls are completely expanded. Let ,\,x|]>> be a fixed-point number. Left shifts > and right shifts > of with s\p> can be computed by cutting the shifted coefficients *2> *2> using the routine > and reassembling the pieces. The shift routines behave very much like generalizations of the routine for carry normalization. <\padded> >| <\named-algorithm|SmallShiftLeft()> 2> \|)>> <\indent> \+\2>|)>> \-\2>|)>> \-h|)>> \*2+h|)>> \*2|)>> ,\,r|]>> |<\cell> \; |<\cell> <\named-algorithm|SmallShiftRight()> 2> \+\2>|)>> \-\2>|)>> <\indent> \+\2>|)>> \-\2>|)>> \-h|)>> \*2+h|)>> ,r,\,r|]>> >>>> <\proposition> Let ,p|}>>. Given a fixed-point number \> with 2-s-2>> and |\|>\2-s>-C*2-2>, the algorithm > returns \+C*2+2>> with >. <\proof> In the main loop, we observe that \|)>> and \2*\>. The assumption 2-s-2>> guarantees that Proposition applies, so that \\*2> and -u*x|\|>\2>. The operation \-h|)>> is therefore exact, so that =h+\>, *\\2>, and |\|>\2>. Since |\|>\C*2>, we also have |\|>\C*2+2>. Now *2|\|>\> if 1> and *2|\|>\2>-C*2-2> if . Combined with the facts that |\|>\C*2+2> and ,\*2\\*2>, it follows that the operation \*2+h|)>> is also exact. Moreover, |\|>\+C*2+2> if 1> and |\|>\2>> if . The last operation \*2|)>> is clearly exact and |\|>=*2|\|>\>. Finally, <\eqnarray*> |r*2>||r*2>+r*2>>>|||\*2>+h*2>>>|||*2+\*2>+h*2>>>|||*2+x*2>>|||*x*2.>>>> This proves that >. <\proposition> Let ,p|}>>. Given a fixed-point number \> with 2+s-2>> and |\|>\min+s-2>,2>|)>>, the algorithm > returns \+C*2+2>> with |\|>\2>. <\proof> Similar to the proof of Proposition. For shifts by *p> bits with \k>, we may directly shift the coefficients > of the operand. Let =\+\*2+\+\-1>*2-1>> be the binary representation of > with ,\,\-1>\>. Then we decompose a shift by *p> bits as the composition of > shifts by either or *p> bits for ,\-1>, depending on whether =0> or =1>. This way of proceeding has the advantage of being straightforward to parallelize, assuming that we have an instruction to extract a new SIMD vector from two given SIMD vectors according to a mask. On processors, there are several \Pblend\Q instructions for this purpose. In the pseudo-code below, we simply used \Pif expressions\Q instead. <\padded> >| <\named-algorithm|LargeShiftLeft(>)> 1> k> <\indent> d|)>\0> <\indent> \ > >>|)>> <\indent> \ >>|)>> 2*d> ,\,x|]>> |<\cell> \; |<\cell> <\named-algorithm|LargeShiftRight(>)> 1> k> <\indent> d|)>\0> <\indent> \ > >>|)>> <\indent> \ >>|)>> 2*d> ,\,x|]>> >>>> The following propositions are straightforward to prove. <\proposition> Let ,*p|}>>. Given a fixed-point number \> with =\=x=0>, the algorithm > returns \> with >. <\proposition> Let ,*p|}>>. Given a fixed-point number \> with |\|>\C>, the algorithm > returns \> with |\|>\C*|)>*2>. Combining with the algorithms from the previous subsection, we obtain the routines and below for general shifts. Notice that we shift by at most bits. Due to the fact that we allow for nail bits, the maximal error is bounded by *2>. <\padded> >| <\named-algorithm|ShiftLeft()> \min|s/p|\>,k-1|)>> min *p|)>> |)>> > |<\cell> \; |<\cell> <\named-algorithm|ShiftRight()> \min|s/p|\>,k-1|)>> min *p|)>> |)>> > >>>> The routines and were designed to work for SIMD vectors =,\,x|)>> of fixed-point numbers and shift amounts =,\,\|)>>. If the number of bits by which we shift is the same =\=\=\> for all entries, then we may use the following routines instead: <\padded> >| <\named-algorithm|UniformShiftLeft(>)> <\indent> i+\> \ k> > >|)>> ,\,x|]>> |<\cell> \; |<\cell> <\named-algorithm|UniformShiftRight(>)> <\indent> i-\> \ 0> > >|)>> ,\,x|]>> >>>> The IEEE-754 standard provides an operation for retrieving the exponent of a machine number \>: if 0>, then \\2>. It is natural to ask for a similar function on >, as well as a similar function in base > (that returns the exponent with \\2> for every \> with 0>). For , we understand that =logw=-\>. The functions and below are approximations for such functions and . More precisely, for all \> in carry normal form with \1>, we have <\eqnarray*> -1\>|>|>>|-1\>|>|.>>>> The routine relies on the computation of a number =\\> with -x|\|>\*2>> that only works under the assumption that -E>. It is nevertheless easy to adapt the routine to higher precisions by cutting into chunks of |-E/p|\>> machine numbers. The routine relies on aroutine that determines the smallest index with \0>. <\padded> >|| <\named-algorithm|Compress()> <\math> \\+x*2|)> <\indent> \+x*2|)>> > |<\cell> \; |<\cell> <\named-algorithm|HiWord()> \> <\indent> =0> >|)>> >|||>| <\named-algorithm|LogB()> *>|)>|)>|)>> |<\cell> \; |<\cell> <\named-algorithm|LogW()> > >>>> <\proposition> The routines >, >, > and > are correct. <\proof> In , let > be the value of > after adding *2> and notice that \2> for all . If =\+x*2> for all , then =x> and returns an exact result. Otherwise, let be minimal such that \\+x*2>. Then +x*2|\|>\2+1-*p>> and |\|>\2+1-*p>>, whence *2|\|>\*2-1>*|\|>> and --x*2|)>|\|>\2-1>*|\|>>. Moreover, the exponent of > is at least +1-*p>. For i>, we have *2|\|>\*2>, whence the exponent of *2> is at most *p-1>. This means that +x*2|)>=\>, whence =\=\=\> and -x|\|>\--x*2|)>|\|>+*2*p>+\+x*2*p>|\|>\2-1>*|\|>+2*p>\2>*|\|>>. The bound() directly follows. The algorithm is clearly correct. Assume that ,0|]>> and let be minimal with \0>. Then we have *2|\|>\2*p>>, whereas *2*p>+\+x*2*p>|\|>\*2*p>>, so that *2*p>\>. If 0>, then we also get *2|\|>\*2>, whence \*2>. If , then \1> by assumption. This shows that() holds as well. In Table below, we have shown the number of machine operations that are required for the fixed-point operations from this and the previous section, as a function of . Of course, the actual time complexity of the algorithms also depends on the latency and throughput of machine instructions. We count an assignment >|)>> as a single instruction. <\big-table||||>|||||||||||>||||||||||||>||||||||||||>||||||||||||>||||||||||||>||||||||||||>||||||||||||>||||||||||||>||||||||||||>||||||||||||>>>>> Operation counts in terms of machine instructions (the results are not necessarily normalized). > Let , and > be as in section. We will represent floating-point numbers as products <\eqnarray*> ||,>>>> where \> is the of and \\>,\,2>|}>> its in base 2>>. We denote by *b>> the set of all such floating-point numbers. We assume that the exponents in> can be represented exactly, by machine integers or numbers in >. As in most existing multiple precision libraries, we use an extended exponent range. For multiple precision arithmetic, it is indeed natural to support exponents of at least as many bits as the precisionitself. In this section, we assume that . The main advantage of this base is that floating-point numbers can be represented in a compact way. The main disadvantage is that normalization involves expensive shifts by general numbers of bits that are not necessarily multiples of . Notice that is also used in the library for multiple precision floating-point arithmetic. Given \C\2>>, we denote *2>=:m\\,e\\|}>>. Numbers in >*2>> are again said to be in and the routine extends to *2>> by applying it to the mantissa. We say that a floating-point number > with \> is in if we either have or -2\\1>. If is also in carry normal form, then we simply say that is in . In absence of overflows, normalizations can be computed using the routine below. In the case when the number to be normalized is the product of two normalized floating-point numbers, we need to shift the mantissa by at most two bits, and it is faster to use the variant . <\padded> >| <\named-algorithm|Normalize|)>>> \> -1-|)>> \,s|)>> \e-s> \max ,-2>|)>> \ =0> >> >>|)>> *2>> |<\cell> \; |<\cell> <\named-algorithm|QuickNormalize|)>>> -1-,m|]>|)>> \> \e-s> \max ,-2>|)>> *2>> >>>> \; <\proposition> Given \\*2>> with \1> and p*k-2>>, the algorithm > returns a normal number *2>\\*2>> with *2>=m*2>. If \2+2-p>>, then so does >. <\proof> If , then returns ,0|]>> and it is easy to verify that the proposition holds. Assume therefore that 0>, so that k*p>. By Proposition, we have \\+2>>. Using Proposition, it follows that \\> with ++2|)>*2-p>+2\+2\>, whence > is carry normal. We also have =m*2> and />|)>\|\|>\2>, whence -2\|\|>\1>. This also implies |\|>\0> and =e-s>, since -m|\|>\C*2+\+C*2*p>>. We conclude that *2>> is dot normal and its value coincides with >. If \2+2-p>>, then it can be checked that \,m|]>|)>> still satisfies -m|\|>\*2>> and that p-2>. From Proposition, it again follows that \\> with ++2|)>*2+2\>, whence > is carry normal. The remainder of the correctness of follows as before. Addition and subtraction of normalized floating-point numbers are computed as usual, by putting the mantissa under a common exponent, by adding or subtracting the shifted mantissas, and by normalizing the result. By increasing the common exponent by two, we also make sure that the most significant word of the addition or subtraction never provokes a carry. <\padded> >| <\named-algorithm|Add(*2>,m*2>>)> max,e|)>+2> \,e-e|)>> \,e-e|)>> ,m|)>> |)>> |<\cell> \; |<\cell> <\named-algorithm|Subtract(*2>,m*2>>)> max,e|)>+2> \,e-e|)>> \,e-e|)>> ,m|)>> |)>> >>>> Floating-point multiplication is almost as efficient as its fixed-point analogue, using the following straightforward: <\padded> >| <\named-algorithm|Multiply(*2>,m*2>>)> ,m|)>> e+e> |)>> |<\cell> \; >>>> <\remark> Strictly speaking, for normal numbers *2>> and *2>>, it is still necessary to prove that ,m|)>|\|>\1>. This conclusion can only be violated if |\|>> and |\|>> are very close to one. Now it can be shown that a carry normal mantissa with 1> and -k*p>\m\1> is necessarily of the form ,0,-\|]>>. For numbers of this form, it is easy to check that ,m|)>|\|>\1>. If we are computing with an SIMD vector *2>=*2>,\,m*2>|)>> of floating-point numbers, then another way to speed up shifting is to let all entries share the same exponent =\=e>. Of course, this strategy may compromise the accuracy of some of the computations. Nevertheless, if all entries are of similar order of magnitude, then the loss of accuracy is usually limited to afew bits. Moreover, by counting the number of leading zeros of the mantissa of a particular entry *2>, it is possible to monitor the loss of accuracy, and redo some of the computations if necessary. When sharing exponents, it becomes possible to use and instead of and for multiple word shifts. The routine should also be adjusted: if the individual exponents given by the vector =,\,e|)>>, then should now return the vector,e|)>>> with ,\,e|)>>. Modulo these changes, we may use the same routines as above for basic floating-point arithmetic. In Table, we give the operation counts for the various variants of the basic arithmetic operations, and > that we have discussed so far. For comparison, we have also shown the operation count for the algorithm from that is based on floating-point expansions (notice that > is somewhat better for this algorithm, since our algorithms rather use \-4>). \ Due to the cost of shifting, we observe that additions and subtractions are the most costly operations for small and medium precisions 8>. For larger precisions 12>, multiplication becomes the main bottleneck. <\big-table|||||||||||||||||>|||||||||||>|>>||||||||||||>|||||||||||||>|>>||||||||||||>|||||||||||||>|>>||||||||||||>>>>> Operation counts for arithmetic floating-point operations in base . >> As we can see in Table, additions and subtractions are quite expensive when working in base. This is due to the facts that shifting is expensive with respect to this base and that every addition involves three shifts. For this reason, we will now examine floating-point arithmetic in the alternative base>. One major disadvantage of this base is that normalized non zero floating-point numbers may start with as many as leading zero bits. This means that should be increased by one in order to achieve asimilar accuracy. We notice that the base > is also used in the native layer for floating-point arithmetic in the library.\ Carry normal forms are defined in the same way as in base . We say that a floating-point number > with \> is in if \> and either or \0>. We say that is in if it is both in carry normal form and in dot normal form. In section, we increased the common exponent of the summands of an addition by two in order to avoid carries. When working in base >, a similar trick would systematically shift out the least significant words of the summands. Instead, it is better to allow for carries, but to adjust the routine for normalization accordingly, by temporarily working with mantissas of words instead of . <\padded> >| <\named-algorithm|Normalize(>)> \,\,m|]>> \|)>> \|)>> \,\|)>> \e+1-\> \max ,-2>|)>> \ =0> >> >>|)>> ,\,m|]>*2*p>> >>>> Modulo this change, the routines for addition and subtract are now as follows: <\padded> >| <\named-algorithm|Add(*2*p>,m*2*p>>)> max,e|)>> \,e-e|)>> \,e-e|)>> ,m|)>> |)>> |<\cell> \; |<\cell> <\named-algorithm|Subtract(*2*p>,m*2*p>>)> max,e|)>> \,e-e|)>> \,e-e|)>> ,m|)>> |)>> >>>> The dot normalization of a product becomes very particular when working in base > since this can always be accomplished using a large shift by either or or bits. Let >> be the variant of > obtained by replacing the condition k> by 2>. In order to achieve an accuracy of about *p> bits at least, we extend the mantissas by one machine coefficient before multiplying them. The routine suppresses the extra entry. <\padded> >| <\named-algorithm|QuickNormalize|)>>> \> \,m,m|]>|)>> \>,\|)>> \e-\> \max ,-2>|)>> ,\,m|]>*2*p>> |<\cell> \; |<\cell> <\named-algorithm|Multiply(*2*p>,m*2*p>>)> \|)>,\,|)>,0|]>> \|)>,\,|)>,0|]>> ,m|)>> e+e> |)>> >>>> In Table, we give the operation counts for floating-point addition, subtraction and multiplication in base >. In a similar way as in section, it is possible to share exponents, and the table includes the operation counts for this strategy. This time, additions and subtractions are always cheaper than multiplications. <\big-table||||||||||||||||||>|||||||||||>|>>||||||||||||>|||||||||||||>|>>||||||||||||>|||||||||||||>|>>||||||||||||>>>>> Operation counts for arithmetic floating-point operations in base >. <\bibliography|bib|plain|> <\bib-list|10> D.H. Bailey, R.Barrio, and J.M. Borwein. High precision computation: mathematical physics and dynamics. , 218:10106\U10121, 2012. R.P. Brent. A Fortran multiple-precision arithmetic package. , 4:57\U70, 1978. R.P. Brent and P.Zimmermann. . Cambridge University Press, 2010. T.J. Dekker. A floating-point technique for extending the available precision. , 18(3):224\U242, 1971. N.Emmart, J.Luitjens, C.C. Weems, and C.Woolley. Optimizing modular multiplication for nvidia's maxwell gpus. In Paolo Montuschi, Michael Schulte, Javier Hormigo, Stuart Oberman, and Nathalie Revol, editors, , pages 47\U54. IEEE, 2016. L.Fousse, G.Hanrot, V.Lefèvre, P.Pélissier, and P.Zimmermann. MPFR: a multiple-precision binary floating-point library with correct rounding. , 33(2), 2007. Software available at . T.Granlund etal. GMP, the GNU multiple precision arithmetic library. , from 1991. D.Harvey, J.vander Hoeven, and G.Lecerf. Even faster integer multiplication. , 36:1\U30, 2016. Yozo Hida, XiaoyeS. Li, and D.H. Bailey. Algorithms for quad-double precision floating-point arithmetic. In , pages 155\U162. IEEE, 2001. J.vander Hoeven and G.Lecerf. Faster FFTs in medium precision. In , pages 75\U82, June 2015. A.Karatsuba and J.Ofman. Multiplication of multidigit numbers on automata. , 7:595\U596, 1963. C.Lauter. Basic building blocks for a triple-double intermediate format. Technical Report RR2005-38, LIP, ENS Lyon, 2005. J.-M. Muller, N.Brisebarre, F.deDinechin, C.-P. Jeannerod, V.Lefèvre, G.Melquiond, N.Revol, D.Stehlé, and S.Torres. . Birkhäuser Boston, 2010. Jean-Michel Muller, Valentina Popescu, and Ping TakPeter Tang. A new multiplication algorithm for extended precision using floating-point expansions. In Paolo Montuschi, Michael Schulte, Javier Hormigo, Stuart Oberman, and Nathalie Revol, editors, , pages 39\U46, 2016. T.Nagai, H.Yoshida, H.Kuroda, and Y.Kanada. Fast quadruple precision arithmetic library on parallel computer SR11000/J2. In , pages 446\U455, 2008. J.M. Pollard. The fast Fourier transform in a finite field. , 25(114):365\U374, 1971. D.M. Priest. Algorithms for arbitrary precision floating-point arithmetic. In , pages 132\U145. IEEE, 1991. A.Schönhage and V.Strassen. Schnelle Multiplikation groÿer Zahlen. , 7:281\U292, 1971. D.Takahashi. Implementation of multiple-precision floating-point arithmetic on Intel Xeon Phi coprocessors. In , pages 60\U70, Cham, 2016. Springer International Publishing. <\initial> <\collection> <\attachments> <\collection> <\associate|bib-bibliography> <\db-entry|+M2XvD2msK5aQZK|book|BZ10> <|db-entry> P. > <\db-entry|+8FSI98Vg65txkT|article|BBB12> <|db-entry> R. J. M. > <\db-entry|+8FSI98Vg65txkl|article|Brent78> <|db-entry> > <\db-entry|+8FSI98Vg65txkX|misc|GMP> <|db-entry> > > <\db-entry|+8FSI98Vg65txkY|article|MPFR> <|db-entry> G. V. P. P. > > <\db-entry|+8FSI98Vg65txkW|article|Dek71> <|db-entry> > <\db-entry|+8FSI98Vg65txkh|book|Mul10> <|db-entry> N. F. C.-P. V. G. N. D. S. > <\db-entry|+Li3PZLCHp1NMhj|inproceedings|NYKK08> <|db-entry> H. H. Y. > <\db-entry|+8FSI98Vg65txkj|inproceedings|Priest91> <|db-entry> > <\db-entry|+7BhgxM6ULkOFw1|inproceedings|MPT16> <|db-entry> Valentina Ping Tak Peter > Michael Javier Stuart Nathalie > <\db-entry|+Li3PZLCHp1NMft|techreport|Laut05> <|db-entry> > <\db-entry|+8FSI98Vg65txkc|inproceedings|HLB01> <|db-entry> Xiaoye S. D. H. > <\db-entry|+Li3PZLCHp1NMoM|inproceedings|vdH:quad> <|db-entry> G. > <\db-entry|+B8MYTQGncIic2J|inproceedings|Tak16> <|db-entry> > <\db-entry|+B8MYTQGncIic2C|inproceedings|ELWW16> <|db-entry> J. C. C. C. > Michael Javier Stuart Nathalie > <\db-entry|+3a783HvX7Lqvpu|article|Kar63> <|db-entry> J. > <\db-entry|+Li3PZLCHp1NMiT|article|Pol71> <|db-entry> > <\db-entry|+E00FWmNELVqHwM|article|SS71> <|db-entry> V. > <\db-entry|+6dIt1hyRlPc0jQ|article|vdH:mul> <|db-entry> J. van der G. > <\references> <\collection> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > <\auxiliary> <\collection> <\associate|bib> BZ10 BBB12 Brent78 GMP MPFR Dek71 Mul10 Dek71 Mul10 NYKK08 Priest91 MPT16 Laut05 HLB01 GMP MPFR vdH:quad Tak16 ELWW16 vdH:quad vdH:quad vdH:quad MPFR vdH:quad vdH:quad vdH:quad GMP vdH:quad Kar63 Pol71 SS71 vdH:mul vdH:quad vdH:quad vdH:quad vdH:quad vdH:quad vdH:quad vdH:quad MPFR MPT16 GMP <\associate|figure> <\tuple|normal> Schematic representation of a fixed-point number |x=,x|]>=x+x*2>, where |\=|log C|\>>. > <\associate|table> <\tuple|normal> Operation counts in terms of machine instructions (the results are not necessarily normalized). > <\tuple|normal> Operation counts for arithmetic floating-point operations in base |2>. > <\tuple|normal> Operation counts for arithmetic floating-point operations in base |2>. > <\associate|toc> |math-font-series||font-shape||1.Introduction> |.>>>>|> |Notations |.>>>>|> > |math-font-series||font-shape||2.Fixed-point arithmetic> |.>>>>|> |2.1.Representation of fixed-point numbers |.>>>>|> > |2.2.Splitting numbers at a given exponent |.>>>>|> > |2.3.Carry propagation |.>>>>|> > |2.4.Addition and subtraction |.>>>>|> > |2.5.Double length multiplication |.>>>>|> > |2.6.General fixed-point multiplication |.>>>>|> > |math-font-series||font-shape||3.Fast parallel shifting> |.>>>>|> |3.1.Small parallel shifts |.>>>>|> > |3.2.Large parallel shifts |.>>>>|> > |3.3.Uniform parallel shifts |.>>>>|> > |3.4.Retrieving the exponent |.>>>>|> > |3.5.Operation counts |.>>>>|> > |math-font-series||font-shape||4.Floating-point arithmetic in base |2>> |.>>>>|> |4.1.Normalization |.>>>>|> > |4.2.Arithmetic operations |.>>>>|> > |4.3.Shared exponents |.>>>>|> > |4.4.Operation counts |.>>>>|> > |math-font-series||font-shape||5.Floating-point arithmetic in base |2>> |.>>>>|> |5.1.Addition and subtraction |.>>>>|> > |5.2.Multiplication |.>>>>|> > |5.3.Operation counts |.>>>>|> > |math-font-series||font-shape||Bibliography> |.>>>>|>