<\body> <\hide-preamble> >> \; ||<\author-affiliation> Laboratoire d'informatique, UMR 7161 CNRS Campus de l'École polytechnique 1, rue Honoré d'Estienne d'Orves Bâtiment Alan Turing, CS35003 91120 Palaiseau >>|>> The Chinese remainder theorem is a key tool for the design of efficient multi-modular algorithms. In this paper, we study the case when the moduli ,\,m>> are fixed and can even be chosen by the user. If > is small or moderately large, then we show how to choose that allow for speedier Chinese remaindering. The multiplication of integer matrices is one typical application where we expect practical gains for various common matrix dimensions and bitsizes of the coefficients. |> >>>>>>>>>>>>>> Modular reduction is an important tool for speeding up computations in computer arithmetic, symbolic computation, and elsewhere. The technique allows to reduce a problem that involves large integer or polynomial coefficients to one or more similar problems that only involve small modular coefficients. Depending on the application, the solution to the initial problem is reconstructed the Chinese remainder theorem or Hensel's lemma. We refer to5> for a gentle introduction to this topic. In this paper, we will mainly be concerned with multi-modular algorithms over the integers that rely on the Chinese remainder theorem. Given \> with 1>, we will denote by \\,m-1|}>> the remainder of the Euclidean division of by . Given an r> matrix \r>> with integer coefficients, we will also denote \r>> for the matrix with coefficients =A rem m>. One typical application of Chinese remaindering is the multiplication of r> integer matrices \r>>. Assuming that we have abound with |\|>\M>> for all>, we proceed as follows: <\enumerate> Select moduli ,\,m>> with *\*m>\M> that are mutually coprime. Compute > and > for ,\>. Multiply \|)>*|)> rem m> for ,\>. Reconstruct from the > with ,\>. The simultaneous computation of rem m> from > for all ,\> is called the problem of . In step , we need to perform > multi-modular reductions for the coefficients of and . The inverse problem of reconstructing rem M> from the rem m> with ,\> is called the problem of . We need to perform > such reconstructions in step 3. Our hypothesis on allows us to recover from. Let us quickly examine when and why the above strategy pays off. In this paper, the number> should be small or moderately large, say \64>. The moduli ,\,m>>> typically fit into amachine word. Denoting by > the bitsize of a machine word (say =32> or =64>), the coefficients of and should therefore be of bitsize \*\/2>. For small>, integer multiplications of bitsize *\/2> are usually carried out using anaive algorithm, of complexity|)>>. If we directly compute the product using> naive integer multiplications, the computation time is therefore of order *\|)>>. In comparison, as we will see, one naive multi-modular reduction or reconstruction for > moduli roughly requires |)>> machine operations, whereas an r>matrix multiplication modulo any of the > can be done in time |)>>. Altogether, this means that the above multi-modular algorithm for integer matrix multiplication has running time *r+r*\|)>>>, which is ,r|)>|)>> times faster than the naivealgorithm. If \r>, then the cost *r|)>> of steps 1 and 3 is negligible with respect to the cost*\|)>> of step 2. However, if > and are of the same order of magnitude, then Chinese remaindering may take an important part of the computation time; the main purpose of this paper is to reduce this cost. If \r>, then we notice that other algorithms for matrix multiplication usually become faster, such as naive multiplication for small >, Karatsuba multiplication for larger>, or FFT-based techniques for very large >. Two observations are crucial for reducing the cost of Chinese remaindering. First of all, the moduli ,\,m>> are the same for all > multi-modular reductions and > multi-modular reconstructions in steps and . If is large, then this means that we can essentially assume that ,\,m>> were fixed once and for all. Secondly, we are free to choose ,\,m>> in any way that suits us. We will exploit these observations by precomputing for which Chinese remaindering can be performed more efficiently than for ordinary moduli. The first idea behind gentle moduli is to consider moduli > of the form -\>, where is somewhat smaller than >, where is even, and \2>. In section, we will show that multi-modular reduction and reconstruction both become a lot simpler for such moduli. Secondly, each> can be factored as =-\|)>*+\|)>> and, if we are lucky, then both -\> and +\> can be factored into moduli of bitsize\>. If we are very lucky, then this allows us to obtain > moduli > of bitsizew> that are mutually coprime and for which Chinese remaindering can be implemented efficiently. Let us briefly outline the structure of this paper. In section, we rapidly recall basic facts about Chinese remaindering and naive algorithms for this task. In section, we introduce gentle moduli and describe how to speed up Chinese remaindering with respect to such moduli. The last section is dedicated to the brute force search of gentle moduli for specific values of and . We implemented a sieving method in which allowed us to compute tables with gentle moduli. For practical purposes, it turns out that gentle moduli exist in sufficient number for 8>. We expect our technique to be efficient for \s>, but this still needs to be confirmed an actual implementation. The application to integer matrix multiplication in section also has not been implemented yet. Let us finally discuss a few related results. In this paper, we have chosen to highlight integer matrix multiplication as one typical application in computer algebra. Multi-modular methods are used in many other areas and the operations of multi-modular reduction and reconstruction are also known as conversions between the positional number system(PNS) and the residue number system(RNS). Asymptotically fast algorithms are based on trees>, with recent improvements in; we expect such algorithms to become more efficient when > exceeds >. Special moduli of the form -\> are also known as . They have been exploited before in cryptography in asimilar way as in section, but with aslightly different focus: whereas the authors of are keen on reducing the number of additions(good for circuit complexity), we rather optimize the number of machine instructions on recent general purpose CPUs (good for software implementations). Our idea to choose moduli -\> that can be factored into smaller moduli is new. Other algorithms for speeding up multiple multi-modular reductions and reconstructions for the same moduli(while allowing for additional pre-computations) have recently been proposed in. These algorithms essentially replace all divisions by simple multiplications and can be used in conjunction with our new algorithms for conversions between residues modulo =m*\*m> and residues modulo ,\,m>. For any integer 1>, we recall that =,m-1|}>>. For machine computations, it is convenient to use the following effective version of the Chinese remainder theorem: <\render-theorem|Chinese Remainder Theorem> Let ,\,m>> be positive integers that are mutually coprime and denote *\*m>. There exist ,\,c>\\> such that for any \\>,\,a>\\>>>, the number <\eqnarray*> ||*a+\+c>*a>|)> rem M>>>> satisfies =a> for ,\>. <\proof> For each ,\>, let =M/m>. Since > and > are coprime, > admits an inverse > modulo > in >>. We claim that we may take =\*u>. Indeed, for *a+\+c>*a>|)> rem M> and any ,\|}>>, we have <\eqnarray*> |>|*\*u+\+a>*\>*u>|)>>>>> Since > is divisible by > for all i>, this congruence relation simplifies into <\eqnarray*> |>|*\*u\a|)>.>>>> This proves our claim and the theorem. <\notation*> We call ,\,c>> the cofactors for ,\,m>> in and also denote these numbers by ;M>=c,\,c>;M>=c>>. For practical computations, the moduli > are usually chosen such that they fit into single machine words. Let > denote the bitsize of a machine word, so that we typically have =32> or =64>. It depends on specifics of the processor how basic arithmetic operations modulo> can be implemented most efficiently. For instance, some processors have instructions for multiplying two >-bit integers and return the exact|)>>-bit product. If not, then we rather have to assume that the moduli> fit into /2> instead of > bits, or replace > by /2>. Some processors do not provide efficient integer arithmetic at all. In that case, one might rather wish to rely on floating point arithmetic and take =52> (assuming that we have hardware support for \ double precision). For floating point arithmetic it also matters whether the processor offers a \Pfused-multiply-add\Q (FMA) instruction; this essentially provides us with an efficient way to multiply two >-bit integers exactly using floating point arithmetic. It is also recommended to choose moduli > that fit into slightly less than > bits whenever possible. Such extra bits can be used to significantly accelerate implementations of modular arithmetic. For a more detailed survey of practically efficient algorithms for modular arithmetic, we refer to. Let ,\,m>>, *\*m>>, \\>,\,a>\\>>> and \> be as in the Chinese remainder theorem. We will refer to the computation of ,\,a>> as a function of as the problem of . The inverse problem is called . In what follows, we assume that ,\,m>> have been fixed once and for all. The simplest way to perform multi-modular reduction is to simply take <\eqnarray*> >|>|,\|)>.>>>> Inversely, the Chinese remainder theorem provides us with a formula for multi-modular reconstruction: <\eqnarray*> |>|;M>*a+\+c>;M>*a>|)> rem M.>>>> Since ,\,m>> are fixed, the computation of the cofactors ;M>> can be regarded as aprecomputation. Assume that our hardware provides an instruction for the exact multiplication of two integers that fit into a machine word. If > fits into a machine word, then so does the remainder =x rem m>. Cutting ;M>> into > machine words, it follows that the product ;M>*a> can be computed using > hardware products and > hardware additions. Inversely, the Euclidean division of an >-word integer by > can be done using +O> multiplications and +O> additions/subtractions: we essentially have to multiply the quotient by > and subtract the result from ; each next word of the quotient is obtained through a one word multiplication with an approximate inverse of >. The above analysis shows that the naive algorithm for multi-modular reduction of modulo ,\,m>>> requires +O|)>> hardware multiplications and +O|)>> additions. The multi-modular reconstruction of can be done using only +O|)>> multiplications and +O|)>> additions. Depending on the hardware, the moduli >, and the way we implement things, |)>> more operations may be required for the carry handling\Vbut it is beyond the scope of this paper to go into this level of detail. Let us now reconsider the naive algorithms from section, but in the case when the moduli ,\,m>> are all close to a specific power of two. More precisely, we assume that <\eqnarray*> >||+\,\|)>,>>>> where |\|>\2> and 2> a small number. As usual, we assume that the > are pairwise coprime and we let *\*m>>. We also assume that is slightly smaller than > and that we have a hardware instruction for the exact multiplication of >-bit integers. For moduli > as in(), the naive algorithm for the Euclidean division of anumber \*s*w>>>> by > becomes particularly simple and essentially boils down to the multiplication of> with the quotient of this division. In other words, the remainder can be computed using \*s> hardware multiplications. In comparison, the algorithm from section requires2*\*s> multiplication when applied to >-bit (instead of -bit) integers. More generally, the computation of > remainders =x rem m,\,a>=x rem m>> can be done using \*s> instead of 2*\*s> multiplications. This leads to a potential gain of afactor , although the remainders are >-bit integers instead of -bit integers, for the time being. Multi-modular reconstruction can also be done faster, as follows, using similar techniques as in. Let \>. Besides the usual binary representation of and the multi-modular representation ,\,a>|)>>=,\,x rem m>|)>>, it is also possible to use the (or ) <\eqnarray*> ||+b*m+b*m*m+\+b>*m*\*m-1>,>>>> where \\>>. Let us now show how to obtain ,\,b>|)>> efficiently from ,\,a>|)>>. Since =b=a>, we must take =a>. Assume that ,\,b> have been computed. For ,1> we next compute <\eqnarray*> >||+b*m+\+b*m*\*m|)> rem m>>>> using =b> and <\eqnarray*> >||+u*m|)> rem m>>|||+u\-\|)>|)> rem m,1|)>.>>>> Notice that ,\,u> can be computed using *> hardware multiplications. We have <\eqnarray*> >||+b*m*\*m|)> rem m=a.>>>> Now the inverse > of *\*m> modulo > can be precomputed. We finally compute <\eqnarray*> >||*-u|)> rem m,>>>> which requires +O> \ multiplications. For small values of , we notice that it may be faster to divide successively by ,\,m> modulo > instead of multiplying with >. In total, the computation of the mixed radix representation ,\,b>|)>> can be done using |2>*+\*s+O*s|)>> multiplications. Having computed the mixed radix representation, we next compute <\eqnarray*> >||+b*m+\+b>*m*\*m-1>>>>> for ,\,1>, using the recurrence relation <\eqnarray*> >||+x*m.>>>> Since \\-i|)>*s*w>>>, the computation of > requires -i|)>*s> multiplications. Altogether, the computation of > from ,\,a>|)>> can therefore be done using |2>*+\*s\\*s*+s|)>> hardware multiplications. For practical applications, we usually wish to work with moduli that fit into one word (instead of words). With the > as in the previous subsection, this means that we need to impose the further requirement that each modulus > can be factored <\eqnarray*> >||*\*m,>>>> with ,\,m\2>>. If this is possible, then the > are called -gentle moduli>. For given bitsizes and 2>, the main questions are now: do such moduli indeed exist? If so, then how to find them? If , then it is easy to construct -gentle moduli =2+\> by taking =-\>, where \\2/2>> is odd. Indeed, <\eqnarray*> -\>||+\|)>*-\|)>>>>> and +\,2-\|)>=gcd+\,2*\|)>=gcd+\,\|)>=gcd,\|)>=1>. Unfortunately, this trick does not generalize to higher values 3>. Indeed, consider aproduct <\eqnarray*> +\|)>*\*+\|)>>||++\+\|)>*2*w>+\>>|||+\+\|)>-+\+\|)>|)>*2*w-1>+\,>>>> where ,\,\> are small compared to >. If the coefficient +\+\> of *w>> vanishes, then the coefficient of *w-1>> becomes the opposite +\+\|)>> of a sum of squares. In particular, both coefficients cannot vanish simultaneously, unless =\=\=0>>. If 2>, then we are left with the option to search -gentle moduli by brute force. As long as is \Preasonably small\Q (say 8>), the probability to hit an -gentle modulus for arandomly chosen > often remains significantly larger than >. We may then use sieving to find such moduli. By what precedes, it is also desirable to systematically take =-\> for \\2/2>>. This has the additional benefit that we \Ponly\Q have to consider /2>> possibilities for >. We will discuss sieving in more detail in the next section. Assuming that we indeed have found -gentle moduli ,\,m>>, we may use the naive algorithms from section to compute ,\,x rem m|)>> from > and for ,\>. Given > for all ,\>, this allows us to compute all remainders > using *s+O*s|)>> hardware multiplications, whereas the opposite conversion only requires *s+O*s|)>> multiplications. Altogether, we may thus obtain the remainders > from and using \*s*+2*s|)>> multiplications. We implemented a sieving procedure in that uses the package with an interface to . Given parameters > and >, the goal of our procedure is to find -gentle moduli of the form <\eqnarray*> ||+\|)>*-\|)>=m*\*m>>>> with the constraints that <\eqnarray*> >|>|>>>|,2>!|)>>||>>> for ,s>, and \\\m>. The parameter is small and even. One should interpret and > as the intended and maximal bitsize of the small moduli >. The parameter > stands for the minimal bitsize of a prime factor of >. The parameter > should be such that > fits into a machine word. In Table below we have shown some experimental results for this sieving procedure in the case when , , =25> and =4>. For \1000000>, the table provides us with >, the moduli ,\,m>, as well as the smallest prime power factors of the product. Many hits admit small prime factors, which increases the risk that different hits are not coprime. For instance, the number divides both -311385> and -376563>, whence these -gentle moduli cannot be selected simultaneously (except if one is ready to sacrifice a few bits by working modulo -311385,2-376563|)>> instead of -311385|)>\-376563|)>>). In the case when we use multi-modular arithmetic for computations with rational numbers instead of integers (see5 and, more particularly, section5.10>), then small prime factors should completely be prohibited, since they increase the probability of divisions by zero. For such applications, it is therefore desirable that ,\,m> are all prime. In our table, this occurs for =57267> (we indicated this by highlighting the list of prime factorsof ). In order to make multi-modular reduction and reconstruction as efficient as possible, adesirable property of the moduli > is that they either divide -\> or +\>. In our table, we highlighted the > for which this happens. We notice that this is automatically the case if ,\,m> are all prime. If only a small number of> (say a single one) do not divide either -\> or +\>, then we remark that it should still be possible to design reasonably efficient algorithms for multi-modular reduction and reconstruction. Another desirable property of the moduli \\\m> is that > is as small as possible: the spare bits can for instance be used to speed up matrix multiplication modulo >. Notice however that one \Poccasional\Q large modulus > only impacts on one out of modular matrix products; this alleviates the negative impact of such moduli. We refer to section below for more details. For actual applications, one should select gentle moduli that combine all desirable properties mentioned above. If not enough such moduli can be found, then it it depends on the application which criteria are most important and which ones can be released. <\big-table|||||||||>>|>>|>>|>>|>>|>>|>>|>,p>,\>>>||||||||>>>||||||||>>>||||||||>>>||||||||>>>||||||||,151,1879,\>>>||||||||>>>||||||||>>>||||||||>>>||||||||>>>||||||||>>>||||||||>>>||||||||>>>||||||||>>>||||||||>>>||||||||>>>||||||||>>>||||||||>>>||||||||>>>>>>>> List of -gentle moduli for , =25>, =4> and \1000000>. , and >> Ideally speaking, we want to be as large as possible. Furthermore, in order to waste as few bits as possible, > should be close to the word size (or half of it) and -w> should be minimized. When using double precision floating point arithmetic, this means that we wish to take \>. Whenever we have efficient hardware support for integer arithmetic, then we might prefer >. Let us start by studying the influence of -w> on the number of hits. In Table, we have increased by one with respect to Table. This results in an approximate increase of the \Pcapacity\Q of the modulus . On the one hand, we observe that the hit rate of the sieve procedure roughly decreases by a factor of thirty. On the other hand, we notice that the rare gentle moduli that we do find are often of high quality (on four occasions the moduli ,\,m> are all prime in Table). <\big-table||||||||||>>|>>|>>|>>|>>|>>|>>|>,p>,\>>>||||||||>>>||||||||>>>||||||||>>>||||||||>>>||||||||>>>||||||||>>>||||||||>>>||||||||>>>||||||||>>>||||||||>>>||||||||>>>>>>>> List of -gentle moduli for , =25>, =4> and \16000000>. Without surprise, the hit rate also sharply decreases if we attempt to increase . The results for and are shown in Table. A further infortunate side effect is that the quality of the gentle moduli that we do find also decreases. Indeed, on the one hand, tends to systematically admit at least one small prime factor. On the other hand, it is rarely the case that each > divides either -\> or +\> (this might nevertheless happen for other recombinations of the prime factors of , but only modulo a further increase of >). <\big-table|||>>|>>|>>|>>|>>|>>|>>|>>|>>|>,p>,\>>>||||||||||>>>||||||||||>>>||||||||||>>>||||||||||>>>||||||||||>>>||||||||||>>>||||||||||>>>||||||||||>>>||||||||||>>>||||||||||>>>>>>>> List of -gentle moduli for , =25>, =4> and \10000000>. An increase of > while maintaining and -w> fixed also results in a decrease of the hit rate. Nevertheless, when going from =25> (floating point arithmetic) to =31> (integer arithmetic), this is counterbalanced by the fact that > can also be taken larger(namely \2>>); see Table for a concrete example. When doubling and > while keeping the same upper bound for >, the hit rate remains more or less unchanged, but the rate of high quality hits tends to decrease somewhat: see Table. It should be possible to analyze the hit rate as a function of the parameters , , > and> from a probabilistic point of view, using the idea that a random number is prime with probability >. However, from a practical perspective, the priority is to focus on the case when \64>. For the most significant choices of parameters \w\w\64> and, it should be possible to compile full tables of gentle moduli. Unfortunately, our current implementation is still somewhat inefficient for \32>. Ahelpful feature for upcoming versions of would be a function to find all prime factors of an integer below a specified maximum >> (the current version only does this for prime factors that can be tabulated). <\big-table|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||>>|>>|>>|>>|>>|>>|>>|>,p>,\>>>||||||||,1480933,\>>>||||||||>>>||||||||>>>||||||||>>>||||||||>>>||||||||>>>||||||||>>>||||||||>||||||||>>>||||||||>>>||||||||>>>||||||||>>>||||||||>>>||||||||>>>||||||||>>>>>>>> List of -gentle moduli for , =31>, =4> and \1600000>. Followed by some of the next gentle moduli for which each > divides either -\> or +\>. <\big-table|||||||||||||||||||||||||||||||||||||||||||||>>|>>|>>|>>|>>|>>|>,p>,\>>>||||>>|||>>>||||>>|||>>>||||>>|||>>>||||>>|||>>>||||>>|||>>>||||>>|||>>>||||>>|||>>>||||>>|||>>>|||||||>||||>>|||>>>||||>>|||>>>||||>>|||>>>||||>>|||>>>||||>>|||>>>>>>>> List of -gentle moduli for , =50>, =4> and \200000>. Followed by some of the next gentle moduli for which each > divides either -\> or +\>. Let us finally return to our favourite application of multi-modular arithmetic to the multiplication of integer matrices \r>>. From a practical point of view, the second step of the algorithm from the introduction can be implemented very efficiently if> fits into the size of a word. When using floating point arithmetic, this means that we should have \2> for all. For large values of , this is unrealistic; in that case, we subdivide the r> matrices into smaller \r> matrices with *m\2>. The fact that> may depend on is very significant. First of all, the larger we can take >, the faster we can multiply matrices modulo >. Secondly, the > in the tables from the previous sections often vary in bitsize. It frequently happens that we may take all > large except for the last modulus>>. The fact that matrix multiplications modulo the worst modulus >> are somewhat slower is compensated by the fact that they only account for one out of every > modular matrixproducts. Several of the tables in the previous subsections were made with the application to integer matrix multiplication in mind. Consider for instance the modulus *\*m=2-656997> from Table. When using floating point arithmetic, we obtain \82713>, \1939>, \140>, \61>, \14> and\13>. Clearly, there is a trade-off between the efficiency of the modular matrix multiplications (high values of > are better) and the bitsize\*w> of (larger capacities are better). <\bibliography|bib|plain|all.bib> <\bib-list|10> J.-C. Bajard, M.E. Kaihara, and T.Plantard. Selected rns bases for modular multiplication. In , pages 25\U32, 2009. D.Bernstein. Scaled remainder trees. Available from , 2004. A.Borodin and R.T. Moenck. Fast modular transforms. , 8:366\U386, 1974. A.Bostan, G.Lecerf, and É.Schost. Tellegen's principle into practice. In , pages 37\U44. ACM Press, 2003. A.Bostan and É.Schost. Polynomial evaluation and interpolation on special sets of points. , 21(4):420\U446, August 2005. Festschrift for the 70th Birthday of Arnold Schönhage. J.W. Cooley and J.W. Tukey. An algorithm for the machine calculation of complex Fourier series. , 19:297\U301, 1965. J.Doliskani, P.Giorgi, R.Lebreton, and É. Schost. Simultaneous conversions with the Residue Number System using linear algebra. Technical Report, HAL, 2016. C.M. Fiduccia. Polynomial evaluation via the division algorithm: the fast fourier transform revisited. In A.L. Rosenberg, editor, , pages 88\U93, 1972. J.vonzur Gathen and J.Gerhard. . Cambridge University Press, New York, NY, USA, 3rd edition, 2013. J.vander Hoeven. Faster Chinese remaindering. Technical report, HAL, 2016. . J.vander Hoeven, G.Lecerf, B.Mourrain, etal. Mathemagix, 2002. . J.vander Hoeven, G.Lecerf, and G.Quintin. Modular SIMD arithmetic in Mathemagix. , 43(1):5:1\U5:37, 2016. A.Karatsuba and J.Ofman. Multiplication of multidigit numbers on automata. , 7:595\U596, 1963. R.T. Moenck and A.Borodin. Fast modular transforms via division. In , pages 90\U96, Univ. Maryland, College Park, Md., 1972. The PARIGroup, Bordeaux. , 2012. Software available from . <\initial> <\collection> <\attachments> <\collection> <\associate|bib-bibliography> <\db-entry|+1xV2GLimqg5PDv|book|GaGe13> <|db-entry> J. > <\db-entry|+1xV2GLimqg5PG1|article|Kar63> <|db-entry> J. > <\db-entry|+GMDX9xKJbGQwPe|article|CT65> <|db-entry> J. W. > <\db-entry|+GMDX9xKJbGQwPs|inproceedings|Fid72> <|db-entry> > > <\db-entry|+GMDX9xKJbGQwQk|inproceedings|MB72> <|db-entry> A. > <\db-entry|+GMDX9xKJbGQwPF|article|BM74> <|db-entry> R. T. > <\db-entry|+1xV2GLimqg5PB0|inproceedings|BoLeSc:2003:tellegen> <|db-entry> G. É. > <\db-entry|+esO3CmteJoJvoU|misc|Bern04> <|db-entry> > > <\db-entry|+esO3CmteJoJvoZ|techreport|vdH:chinese> <|db-entry> > > <\db-entry|+6J5VwJjgyAB6eF|inproceedings|BKP09> <|db-entry> M. E. T. > <\db-entry|+3nRmbmrAIRoBA1|techreport|DGLS16> <|db-entry> P. R. É. > > <\db-entry|+esO3CmteJoJvoV|article|vdH:simd> <|db-entry> G. G. > <\db-entry|+1xV2GLimqg5PBA|article|BoSc05> <|db-entry> É. > <\db-entry|+1xV2GLimqg5PQ0|misc|vdH:mmx> <|db-entry> G. B. > > <\db-entry|+1xV2GLimqg5PJI|manual|PARI> <|db-entry> Group> > <\references> <\collection> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > <\auxiliary> <\collection> <\associate|bib> GaGe13 Kar63 CT65 Fid72 MB72 BM74 BoLeSc:2003:tellegen Bern04 vdH:chinese BKP09 BKP09 DGLS16 vdH:simd BKP09 BoSc05 vdH:mmx PARI GaGe13 <\associate|table> >|> List of |6>-gentle moduli for |w=22>, |w=25>, |\=4> and |\\1000000>. |> >|> List of |6>-gentle moduli for |w=23>, |w=25>, |\=4> and |\\16000000>. |> >|> List of |8>-gentle moduli for |w=22>, |w=25>, |\=4> and |\\10000000>. |> >|> List of |6>-gentle moduli for |w=28>, |w=31>, |\=4> and |\\1600000>. Followed by some of the next gentle moduli for which each |m> divides either |2-\> or |2+\>. |> >|> List of |6>-gentle moduli for |w=44>, |w=50>, |\=4> and |\\200000>. Followed by some of the next gentle moduli for which each |m> divides either |2-\> or |2+\>. |> <\associate|toc> |math-font-series||1Introduction> |.>>>>|> |math-font-series||2Chinese remaindering> |.>>>>|> |2.1The Chinese remainder theorem |.>>>>|> > |2.2Modular arithmetic |.>>>>|> > |2.3Naive multi-modular reduction and reconstruction |.>>>>|> > |math-font-series||3Gentle moduli> |.>>>>|> |3.1The naive algorithms revisited for special moduli |.>>>>|> > |3.2Combining special moduli into gentle moduli |.>>>>|> > |math-font-series||4The gentle modulus hunt> |.>>>>|> |4.1The sieving procedure |.>>>>|> > |4.2Influence of the parameters |s>, |w> and |w> |.>>>>|> > |4.3Application to matrix multiplication |.>>>>|> > |math-font-series||Bibliography> |.>>>>|>