> <\body> symmetric and lattice polynomials>||<\author-misc> This work has been partly supported by the grant of the Région Ile-de-France, NSERC and the CRC program. |<\author-affiliation> Laboratoire d'informatique UMR 7161 CNRS École polytechnique 91128 Palaiseau Cedex, France |>>||<\author-affiliation> LIRMM UMR 5506 CNRS Université de Montpellier II Montpellier, France > \; >>||<\author-affiliation> Computer Science Department Western University London, Ontario Canada |>>> <\abstract> In this paper, we consider the problem of efficient computations with structured polynomials. We provide complexity results for computing Fourier Transform and Truncated Fourier Transform of symmetric polynomials, and for multiplying polynomials supported on a lattice. Fast computations with multivariate polynomials and power series have been of fundamental importance since the early ages of computer algebra. The representation is an important issue which conditions the performance in an intrinsic way; see for some historical references.\ It is customary to distinguish three main types of representations: dense, sparse, and functional. A is made of a compact description of the support of the polynomial and the sequence of its coefficients. The main example concerns -- it suffices to store the coordinates of two opposite vertices. In adense representation all the coefficients of the considered support are stored, even if they are zero. If a polynomial has only a few non-zero terms in its bounding block, we shall prefer to use a which stores only the sequence of the non-zero terms as pairs of monomials and coefficients. Finally, a stores a function that can produce values of the polynomials at any given point. This can be apure blackbox (which means that its internal structure is not supposed to be known) or a specific data structure such as (see Chapter4 of, for instance, and for a concrete library for the manipulation of polynomials with a functional representation). For dense representations with block supports, it is classical that the algorithms used for the univariate case can be naturally extended: the naive algorithm, Karatsuba's algorithm, and even Fast Fourier Transforms can be applied recursively in each variable, with good performance. Another classical approach is the Kronecker substitution which reduces the multivariate product to one variable only; for all these questions, we refer the reader to classical books such as. When the number of variables is fixed and the partial degrees tend to infinity, these techniques lead to softly linear costs. After the discovery of sparse interpolation, probabilistic algorithms with a quasi-linear complexity have been developed for sparse polynomial multiplication. It has recently be shown that such asymptotically fast algorithms may indeed become more efficient than naive sparse multiplication. In practice however, it frequently happens that multivariate polynomials with a dense flavor do not admit a block support. For instance, it is common to consider polynomials of a given total degree. In a recent series of works, we have studied the complexity of polynomial multiplication in this ``semi-dense'' setting. In the case when the supports of the polynomials are initial segments for the partial ordering on >, the truncated Fourier transform is a useful device for the design of efficient algorithms. Besides polynomials with supports of a special kind, we may also consider what will call ``structured polynomials''. By analogy with linear algebra, such polynomials carry a special structure which might be exploited for the design of more efficient algorithms. In this paper, we turn our attention to a first important example of this kind: polynomials which are invariant under the action of certain matrix groups. We consider only two special cases: finite subgroups of > and finite groups of diagonal matrices. These cases are already sufficient to address questions raised in celestial mechanics ; it is hoped that more general groups can be dealt with using similar ideas. In the limited scope of this paper, our main objective is to prove complexity results that demonstrate the savings induced by a proper use of the symmetries. Our complexity analyses take into account the number of arithmetic operations in the base field, and as often, we consider that operations with groups and lattices take a constant number of operations. A serious implementation of our algorithms would require an improved study of these aspects. Of course, there already exists an abundant body of work on some of these questions. FFT algorithms date back to , with contributions as recent as , but are dedicated to crystallographic symmetries. A more general framework due to was recently revisited under the point of view of high-performance code generation ; our treatment of permutation groups is in a similar spirit, but to our understanding, these previous papers do not prove results such as those we give below (and they only consider the \ FFT, not its truncated version).\ Similarly, our results on diagonal groups, which fit in the general context of FFTs over lattices, use similar techniques as in a series of papers initiated by and continued as recently as , but the actual results we prove are not in those references. We will work over an effective base field (or ring) > with sufficiently many roots of unity; the main objects are polynomials |]>\\,\,x|]>> in variables over >. For any =,\,\|)>\\> and \|]>>, we denote by >> the monomial >*\*x>> and by >> the coefficient of in >>. The support > of a polynomial \|]>> is the set of exponents \\> such that >\0>. For any subset of >, we define |]>> as the polynomials with support included in . As an important special case, when \,d-1|}>>, we denote by |]>\\|]>> the set of polynomials with partial degree less than in all variables. Let \\> be a primitive th root of unity (in Section, it will be convenient to write this root *\/d>>). >For any =,\,\|)>\\>, we define >=>,\,\>|)>>. One of our aims is to compute efficiently the map <\equation*> FFT>:|]>>|>|>>>||>|>|)>|)>\\>>>>>>|\> when is a ``structured polynomial''. Here, >> denotes the set of vectors with entries in > indexed by >; in other words, \\>> implies that >\\> for all \\>. Likewise, \\>> denotes the set of matrices with indices in >, that is, \\\\>> implies that ,\>\\> for all ,\\\>.\ Most of the time, we will take >> with\\>, although we will also need more general of mixed radices *\*p>>. We denote by \\> the inner product of two vectors in >. We also let \|k-1>,0,1,0,\,0|)>>, for k\n>, be the th element of the canonical basis of >. At last, for \\> and \>>> we denote by >> the bit reversal of in length and we extend this notation to vectors by |]>=|]>,\,|]>|)>>. Let us first consider the in-place computation of a full -dimensional FFT of length >> in each variable. We first recall the notations and operations of the decimation in time variant of the FFT, and we refer the reader to for more details. In what follows, > is a th primitive root of unity.\ We start with the FFT of a univariate polynomial \>. Decimation in time amounts to decomposing into its even and odd parts, and proceeding recursively, by means of > decimations applied to the variable . For k\d>, we will write =P> for the input and denote by=|)>k\d>> the result after decimation steps, for ,\|}>>.\ At stage , the decimation is computed using butterflies of span \2-s>>. If \>> is even and belongs to >> then these butterflies are given by <\eqnarray*> |+j>>>|*\+j*>>>>>>>|||*\>>>||\*\>>>>>>*+j>>>|*\+j>>>>>>.>>>> Putting the coefficients of all these linear relations in a matrix \\\\>>, we get =\*\>. The matrix > is sparse, with at most two non-zero coefficients on each row and each column; up to permutation, it is block-diagonal, with blocks of size 2. After > stages, we get the evaluations of in the bit reversed order: <\equation*> \>=P>>|)>. We now adapt this notation to the multivariate case. The computation is still divided in > stages, each stage doing one decimation in every variable ,\,x>. Therefore, we will denote by >\P>> the coefficients of the input for \\> and by >> the coefficients obtained during the th stage, after the decimations in ,\,x> are done, with ,\|}>> and ,n|}>>, so that >=\>>. We abbreviate >\\>> for the coefficients after stages. The intermediate coefficients >> can be seen as evaluations of intermediate polynomials: for every ,\|}>>, \\>> and \\>>, one has\ <\eqnarray*> *\+\>>||>>|)>|]>>|)>>>>> where >=\\>>P*\+\>*\>> are obtained through an -fold decimation of . Equivalently, the coefficients >> satisfy <\eqnarray*> *\+\>>||\\>>P*\+\>*\|]>\\*\>.>>>> Thus, >>=P|]>>>|)>> yields >> in bit reversed order. For the concrete computation of the coefficients >> at stage from the coefficients>> at stage , we use so called ``elementary transforms'' with respect to each of the variables ,\,x>. For any ,n|}>>, the coefficients >> are obtained from the coefficients >> through butterflies of span =2-s>> with respect to the variable >; this can be rewritten by means of the formula <\eqnarray*> *\+\>>>|+\|)>*\+\>>>>>>>||||]>*\>>>||\|]>*\>>>>>>**\+\>>>|+\|)>*\+\>>>>>>>>>> for any \\>> with even -th coordinate > and \\>>. Equation() can be rewritten more compactly in matrix form =\*\> where > is a sparse matrix in \\>> which is naturally indexed by pairs of multi-indices in>. Setting =\*\*\\\\\>,>we also obtain a short formula for > as a function of >: <\eqnarray*> >||*\>>>> Remark that the matrices ,\,\> commute pairwise. Notice also that each of the rows and columns of > has at most non zero entries and consequently those of > contains at most > non zero entries. For this reason, we can apply the matrices > (resp. >) within |\|>|)>=\|)>> (resp. |)>>) operations in >. Finally, the full -dimensional FFT of length >> costs \3/2*\*n*d=3/2*d*log|)>> operations in > (see). <\example> Let us make these matrices explicit for polynomials in variables and degree less than , so that =2>. We start with the first decimation in > whose butterflies of span > are captured by > with =2-s>=2,s=1> and . It takes as input >=\>\P>> and outputs >> for \\=,3|}>>. For any \\=> and \,3|}>>, we let <\eqnarray*> ,\|)>>>>|,\|)>>>>>>>>||||>||1>>>>>*,\|)>>>>|,\|)>>>>>>>.>>>> So, > is a 16 by 16 matrix, which is made of diagonal blocks of ||>||1>>>>>> in a suitable basis. The decimation in > during stage is similar : <\eqnarray*> ,\|)>>>>|,2+\|)>>>>>>>>||||>||1>>>>>*,\|)>>>>|,2+\|)>>>>>>>>>>> for \\> and \\>. Consequently, > is made of diagonal blocks |>||1>>>>>> (in another basis than >). Their product \\*\> corresponds to the operations <\eqnarray*> |,\|)>>>>|,\|)>>>>|,2+\|)>>>>|,2+\|)>>>>>>>>||||||>||||>||||>||||>>>>*,\|)>>>>|,\|)>>>>|,2+\|)>>>>|,2+\|)>>>>>>>>>>> for ,\\\>. Thus > is made of such 4 by 4 matrices on the diagonal (in yet another basis). Note that in general, the matrices > and > are still made of diagonal blocks, but these blocks vary along the diagonal. We can sum it all up in the two following algorithms. <\named-specified-algorithm|Butterfly> , degree , stage , index ,\|)>> of the butterfly with \\>>, \\>>, root of unity > and coefficients \\>> for the butterfly (>\\+\|)>*\+\>> for \\>). the output of the butterfly \\>>\ <|named-specified-algorithm> For ,n> do//decimation in > <\indent> For \\=> such that =0> do <\indent> |]>*\>*>+\>>//butterfly in one variable >=\>+r> +\>=\>-r> Return >> <\named-specified-algorithm|FFT> , degree , > and coefficients \\>> of coefficients >\\>> in bit reversed order <|named-specified-algorithm> For ,\> do//stage <\indent> \d/2> For \\>>, \\-s>>> do//pick a butterfly <\indent> For \\> do >\\+\|)>*\+\>> >=>,\,\>|)>> For \\> do +\|)>*\+\>\\>> Return >> In the last section, we will also use the ring isomorphism <\equation*> |||]>/-1,\,x-1|)>>|>||]>>/>-1,\,x>-1|)>|]>>>>||>|=>|)>|)>\\>>>>>>> with >|)>=\\>>\\>>P*\+\>*\\\*\>|)>*>*\|)>>> for \\>>, and >*\|)>>=>*x|)>>*\*>*x|)>>>. These polynomials generalize the decomposition of a univariate polynomials into -1|)>> and +1|)>>. We could obtain them through a decimation in frequency, but it is enough to remark that we can reconstruct them from the coefficients > thanks to the formula <\equation> Q>|)>=\\>>\|]>*\+\>*>*\|)>>. In this section, we let \> be a permutation group. The group acts on the polynomials |]>> the map <\equation*> \:|]>>|>||]>>>|,\,x|)>>|>||)>\P>,\,x>|)>>>>>>|\>. We denote by |]>\stab \|]>> the set of polynomials invariant under the action of . Our main result here is that one can save a factor (roughly) > when computing the FFT of an invariant polynomial. Our approach is in the same spirit as the one in , where similar statements on the savings induced by (very general) symmetries can be found. However, we are not aware of published results similar to Theorem below.\ Let \,x|]>> be a symmetric bivariate polynomial of partial degrees less than , so that =3>; let also \\> be a primitive eighth root of unity. We detail on this easy example the methods used to exploit the symmetries to decrease the cost of the FFT. The coefficients of are placed on a 8> grid. The bivariate classical FFT consists in the application of butterflies of size \2-s>> for from to , as in Figure. When some butterflies overlap, we draw only the shades of all but one of them. The result of stage is the set of evaluations >,\>|)>|)>\\>> in bit reversed order. |Classical bivariate FFT> We will remark below that each stage preserves the symmetry of the input. In particular, since our polynomial is symmetric, the coefficients >> at stage are symmetric too, so we only need to compute the coefficients ,\|)>>> for ,\|)>> in the (sometimes called asymmetric unit) k\k\8>. We choose to compute at stage only the butterflies which involves at least an element of ; the set of indices that are included in a butterfly of span =2-s>> that meets will be called the extension >> of . |Symmetric bivariate FFT> Every butterfly of stage meets the fundamental domain, so we do not save anything there. However, we save out of butterflies at stage and out of butterflies at the third stage. Asymptotically in , we gain a factor two in the number of arithmetic operations in > compared to the classical FFT; this corresponds to the order of the group. \ The group also acts on > with |)>=>,\,\>|)>> for any \\>. This action is consistent with the action on polynomials since >|)>=\|)>>>. Because acts on polynomials, it acts on the vector of its coefficients. More generally, if \> and \\>>, we denote by \\> the vector defined by >=\|)>>> for all \I>. If \> is stable by , \I>, we define the set of invariant vectors > by \\\:\g\G,\=\|}>>. For any two sets satisfying I\\>, we define the restriction \\> of \\> by |)>>=\>> for all \J>. Recall the definition of the lexicographical order > : for all ,\\\>, \\> if there exists ,n-1|}>> such that =\> for any t\k> and \\>. Finally for any subset \> stable by , we define a fundamental domain > of the action of on by \\S:\g\G,\\g|)>|}>> together with the projection :\\\> such as|)>|}>\G|}>|)>\|)>>. Let >\|)>> be the fundamental domain associated to the action of on >. Any -symmetric vector \\>> can be reconstructed from its restriction \\> using the formula >=|)>|)>>>. As it turns out, the in-place FFT algorithm from Section admits the important property that the coefficients at all stage are still -symmetric. <\lemma> Let \|]>> and the vectors ,\,\>\\>> be as in Section. Then if \|]>>, ,\,\>\\,G>>. <\proof> Given \|]>>, \\> and G>, we clearly have |)>>=\>>. For any ,\|}>>, ,\\\>>, \\-s>>> and G\\>, we also notice that <\eqnarray*> *2-s>+\|)>>|||)>*2-s>+g|)>>>||)>|]>>|||]>|)>>>||)>\g|)>>||\\.>>>> Hence, using Equation, we get <\eqnarray*> *2-s>+\|)>> >|||)>*2-s>+g|)>>>>|||\\>>P*2-s>+g|)>>*\|)>|]>\\*2-s>>>>|||\\>>P|)>*2-s>+g|)>>*\|)>|]>\g|)>*2-s>>>>|||\\>>P*2-s>+\|)>>*\|]>|)>\g|)>*2-s>>>>|||\\>>P*2-s>+\>*\|]>\\*2-s>>>>|||*2-s>+\>.>>>> Thus, |)>>=\>> for all \\>, whence ,\,\>\\,G>>. This lemma implies that for the computation of the FFT of a -symmetric polynomial , it suffices to compute the projections ,\,\>>. In order to apply formula() for the computation of > as a function of >, it is not necessary to completely reconstruct >, due to the sparsity of the matrix >. Instead, we define the >-expansion> of the set by <\eqnarray*> >>||\\*\:\\F,\\\|}>,>>>> where \\> stands for the ``bitwise exclusive or'' of > and >. For any \\>, the set \\*\:\\\|}>> describes the vertices of the butterfly of span > that includes the point >. Thus, >> is indeed the set of indices of > that are involved in the computation of > the formula =\*\>. <\lemma> For any \,2-1>|}>>, let <\eqnarray*> |]>>>||\\:\\F,\\\>|}>,>>>> so that >\F|]>>>. Then we have |]>>=\*F|]>>+\>>. <\proof> Assume that \F|]>>>. Then clearly, *\\F>, whence *\+\>\F|]>>>. Conversely, if \F|]>>>, then there exists a \F> with =\\\> and \\>>. Let =|\/\|\>>. For any G>, we have |)>=|g|)>/\|\>\|\/\|\>=\>. Consequently, |\/\|\>\F|]>>>, whence \\*|\/\|\>+\>>. For the proof of the next lemma, for \,n|}>>, define =\\:\=\|}>> and =\\j>\.> <\lemma> There exists a constant (depending on ) such that <\eqnarray*> >|\|>-|>|\|>>|>|.>>>> <\proof> With the notation above, we have \\|\|>=d>. Taking >, it follows that j>\|)>\\|\|>\C*d>. On the other hand, the orbit of any element in > under contains exactly > elements and one element in . In other words, \|\|>=|\|>/> and so -|\|>/\C*d>. Finally, -|\|>|)>/\C/*d> which implies *d\-d/\C*d>. Let >>> be the restriction of > to >> and notice that >>> is stable under >, >>|)>\\>>>. Therefore the restriction >>>> of the map :\>\\>> to vectors in >>> is a >-algebra morphism from >>> to >>>. By construction, we now have >>=\>>>*\>>.> This allows us to compute > as a function of > using <\eqnarray*> >||>>>*\>|)>|)>>>>> where >|)>> denotes the >-expansion >>> of >. <\remark> We could have saved a few more operations by only computing the coefficients >. To do so, we would have performed just the operations of a butterfly corresponding to vertices inside . However, the potential gain of complexity is only linear in the numbers of monomials >. The formula() yields a straightforward way to compute the direct FFT of a -symmetric polynomial: <\named-specified-algorithm|Symmetric-FFT> , and coefficients \\>> of \|]>> in >\\>> in bit reversed order <|named-specified-algorithm> For ,\> do <\indent> \d/2> For \\>>, \\-s>>> s.t. *\+\\F>> do <\indent> For \\> do >\\+\|)>*\+\|)>>> >=>,\,\>|)>> For \\> do +\|)>*\+\|)>>\\>> Return >> \; <\remark> The main challenge for an actual implementation of our algorithms consists in finding a way to iterate on sets like >> without too much overhead. We give a hint on how to iterate over in Lemma (see also for similar considerations). The inverse FFT can be computed classically by unrolling the loops in the inverse order and inverting the operations of the butterflies. The main result of this section is then the following theorem, where > denotes the cost of a full -dimensional FFT of multi-degree .\ <\theorem> For fixed and , and for \>, the direct ( inverse) symmetric FFT can be computed in time <\eqnarray*> >||>*+\|)>.>>>> <\proof> At stage of the computation, the ratio between the computation of > as a function of> and > as a function of > is -s>>|\|>/|\|>>, so that <\eqnarray*> |>>|>|-1>>|\|>+-2>>|\|>+\+|\|>|\*|\|>>.>>>> Lemmas and thus imply <\equation*> >|\|>\|]>>|\|>\|)>|]>>|\|>*|)>\d/+2*C*\*d, whence <\eqnarray*> |>>|>|*d/+2*C*d|\*d>=>+>.>>>> The result follows for \>. Finally, the following lemma gives a description of an ``open'' subset of , characterized only by simple inequalities. Although we do not describe implementation questions in detail here, we point out that this simple characterization would naturally be used in order to iterate over the sets >>, possibly using tree-based techniques as in . <\lemma> Let > be the set of \,n|}>> such that k> and g\G> such that =i> for j> and =k>. For \,n|}>>, define =\\:\\\|}>>. Then <\eqnarray*> \>||\\>H\\.>>>> <\proof> Assume that \\> does not lie in . Then |)>\\> for some G>. Let ,n|}>> be minimal with \j>, whence j> and \\>. Since \\>, it follows that \\>, so \H>. Inversely, assume that \\> does not lie in \\>H>, so that there exists \\> with \H>. Let G> be such that =i> for j> and =k>. Then, \ |)>\\>. In this section, we deal with polynomials supported on lattices; our main result is an algorithm for the multiplication of such polynomials using Fourier transforms. The first subsection introduces the main objects we will need (a lattice > and its dual >); then, we will give an algorithm, based on Smith normal form computation, for the FFT on what we will call a , and we will finally deduce a multiplication algorithm for the general case. The techniques we use, based on Smith normal form computation, can be found in several papers, originating from ; in particular, the results in Section are essentially in (in a more general form). Assume that > admits a primitive th root of unity for any order 1>, which we will denote by*\/k>>. Let > be a free >-submodule of > of rank , generated by the vectors ,\,\\\>. Then <\eqnarray*> |]>>>||\|]>:supp P\\|}>>>>> is a subring of |]>>, and we will call elements of |]>>> . First, we show that these polynomials are the invariant polynomial ring for a diagonal matrix group. The set > acts on |]>> , for any =,\,\|)>\\>, the isomorphism >> of |]>> given by <\eqnarray*> >:P,\,x|)>>|>|*\*\>*x,\,\*\*\>*x|)>.>>>> Note that +\>=\>> for any \\>. The action of >> on the monomial >> is given by >>|)>=\*\*\\|)>>*\>>. In particular, all elements of |]>>> are invariants under the action of >> if and only if <\eqnarray*> \*\>|>|,>>>> where \\n>> is the matrix with columns ,\,\>, that is =|)>>. Let > be the dual (or reciprocal) lattice of >, that is the set of all \\> satisfying Equation. A basis of > is given by the columns of =\\\n>>. Let be the group of actions >:\\\|}>>. It is generated by >|}>i\n>>, where > are the columns of >. From Equation, we deduce that is the diagonal matrix group of all actions >> which leave |]>>> invariant. Conversely, because monomials are mapped to monomials by elements of , the ring of invariants is spanned by the monomials >> with \*\\\>, > belongs to the dual of >. Since the dual of the dual of a lattice is the lattice itself, we deduce that \\> and |]>>=\|]>>. Note that only > modulo n>> matters to determine the group . <\example> Consider the lattice > generated by => and =>, and let =|>||>>>>>. The lattice polynomials |]>>> are the polynomials ,x*x,x|]>>. We have =|>||>>>>>, so is the group generated by >: P,x|)>\P,-x|)>> and >=Id>. The lattice polynomials |]>>=\,x*x,x|]>> are those polynomials invariant under the symmetry ,x|)>\P,-x|)>>. Given =,\,\|)>>, we define <\eqnarray*> |]>>>||\|]>:deg>\\,\,deg>\\|}>>>||]>,\>>|||]>>\\|]>>.>>>> For ,n|}>>, let \0> be minimal such that >\\|]>>>. We call the block >\\>\\\\>> a for >. In this subsection, we consider the computation of the FFT at order > of a polynomial \|]>,\>>. In what follows, we set =\*\/p>> for each , so that we have to show how to compute the set of evaluations >|)>=P>,\,\>|)>> for \\>>. If we proceeded directly, we would compute more evaluations that the number of monomials of , which is \\>|\|>>. We show how to reduce the problem to computing exactly \\>|\|>> evaluations. For any =,\,\|)>\\>, notice that <\eqnarray*> >|)>>||*\*\/p>,\,\*\*\/p>|)>>>|||*\*/p+\|)>>,\,\*\*/p+\|)>>|)>.>>>> Therefore we would only need to consider the evaluations with multi-indices in *\,\,1/p*\|)>> modulo>. There are only \\>|\|>> such evaluations but, as in Section, we would have to expand the fundamental domain of evaluations at each stage of the FFT to compute the butterflies. Instead, we propose a more direct method with no domain expansion. We show that, regarding the evaluations, polynomials in |]>>> can be written >,\,\>|)>> where ,\,\> is a basis of > and the evaluation can be done directly in this rewritten form. We introduce the notation >\,p-1|}>> for the remainder of \> by \>>. If =,\,\|)>\\> and =,\,\|)>\>|)>>, we let |\>>\|\>>,\,|\>>|)>>. This notation is motivated by the remark that >>|)>> depends only on the class of > and > modulo *\,\,p*\|)>>. <\lemma> There exists a basis ,\,\|)>> of > and a basis ,\,\|)>> of > such that <\eqnarray*> >|)>>||*\*\/q>>>>> where =\|\>>>> and the >'s are positive integers satisfying \q\\\q>. <\proof> Let us consider the lattice of the exponents of >>|)>|)>i\n>> for \\>. If =,\,\|)>\\>, then <\equation*> \>>|)>=\*\*/p,\,\/p|)>\\|]>>=\*\*\|)>/p,\,|)>/p|)>|]>>. We define the lattice > spanned by the columns of =\*>||>||>|>|||>>>>>>, that is =|)>/p>. We want to take the Smith normal form of > but the coefficients of > are not necessarily integers. So we multiply the matrix by \LCM,\,p|)>>, take the Smith normal form and divide by >. Therefore there exists \,\\GL|)>> and integers \\\d\d> such that <\equation> \*\*\=/\>||>||>|>|||/\>>>>>. Let us prove that \\>. By definition of ,\,p>, there exists \\n>> such that *\=>||>||>|>|||>>>>>> . Thus we have \*\*>||>||>|>|||>>>>>=Id>, implying that *\=Id> and our result. Because \\>, we have \\> and by setting \\/d\\>>, we get \q\\\q>. With a geometrical point of view, the equality of Equation gives the existence of the two required bases. The columns of > give the basis ,\,\|)>> of > and the columns of \\*\> give the basis ,\,\|)>> of >. To sum up, we have <\equation> \*>||>||>|>|||>>>>>*\=>||>||>|>|||>>>>> which is the matricial form of Equation. <\proposition> The following ring morphism > <\equation*> |]>/>-1,\,y>-1|)>>|>||]>/>-1,\,x>-1|)>>>|>|>||\>>>>>>>> is well-defined, injective and its image is |]>,\>>. Moreover, if > then <\eqnarray*> *\+\+\*\>|)>>||*\*\/q>,\,\*\*\/q>|)>.>>>> <\proof> First, we prove that > is well defined. For this matter we have to check that >-1|)>=\*|\>>>-1=0> modulo >-1,\,x>-1|)>>. It is sufficient to prove that *|\>>\*\,\,p*\|)>>, which follows from Equation. Then we prove Equation. Let \|]>> and |)>\\|)>>. As a consequence of Lemma, one has <\eqnarray*> *\+\+\*\>|)>>||*\*\/q>,\,\*\*\/q>|)>.>>>> Now, we prove that > is injective. Indeed if \|]>> satisfies =0> then for all \\>, one has >|)>=0>. Using Equation, we get that for all ,\,\\\>, *\*\/q>,\,\*\*\/q>|)>=0>. As a result, belongs to the ideal generated by >-1,\,y>-1|)>>. Finally it is trivial to see that the image of > is included in |]>,\>>. Reciprocally, let >\\|]>,\>>. We write =\*|\>>> with \\> and define |\>=|\>>*|\>>>. Now because the lattice *\*\\\\q*\*\> is included in *\,\,p*\|)>>, we have that |\>>,\,|\>>|)>>|)>=x|\>>=x>> modulo >-1,\,x>-1|)>>. In particular, the FFT of is uniquely determined by its restriction to evaluations of the form >|)>> with =\*\+\+\*\> and =,\,\|)>\\>>. Notice that Proposition implies that the number *\*q> of evaluations equals the numbers \\>|\|>> of monomials in |]>,\>>. We define the of to be the vector of values *\+\+\*\>|)>|)>\\>>>. We have thus shown that the computation of the lattice FFT of reduces to the computation of an ordinary multivariate FFT of order >. Assume that \>|)>> is such that \d> for each and consider the computation of the FFT at order > of a lattice polynomial \|]>,\>>. In practice, one often has =p*2>>, and this is what we will assume from now on. For simplicity, we also suppose that =\=\> and denote by > this common value. The results are valid in general but we would have to do more decimations in some variables than in others, which would not match the notations of Section. We give here an algorithm for the multiplication of two polynomials ,P\\|]>,\/2>>.\ We start by doing > FFT stages. These stages preserve the lattice >, because the butterflies have a span which belongs to >. Therefore we compute only the butterflies whose vertices are in >.\ After these > stages, we twist the coefficients thanks to Formula and obtain the polynomials >=>>|)>>, for \\>>>. As explained in Section , up to a minor modification (here, the partial degrees are not all the same), these polynomials reduce our multiplication to >> multiplications in |]>,\>/>-1,\,x>-1|)>>. Thanks to Proposition, we are able to perform these latter multiplications as multiplications in |]>>/>-1,\,y>-1|)>>. <\named-specified-algorithm|Lattice-partial-FFT> , >, > and coefficients \\>> of \|]>,d>> the coefficients >>\>> <|named-specified-algorithm> For ,\> do//> decimation steps in each variable <\indent> \d/2> For \\>>, \\\\>> do//a butterfly in > <\indent> For \\> do >\\+\|)>*\+\>> >=>,\,\>|)>> For \\> do +\|)>*\+\>\\>> Return > <\named-specified-algorithm|Lattice-FFT-multiplication> , >, > and ,P\\|]>,d>> the coefficients product *P> <|named-specified-algorithm> For do <\indent> >=>,\,|)>>|)>\\>|)>> Let \Q>\|]>,\>|)>>>>> obtained from >> Transform > as a vector of |]>>> using Proposition Multiply *Q> in |]>/>-1,\,y>-1|)>|)>>>>> Transform as a vector of |]>,\>> using Proposition Recover >> from |]>,\>|)>>>>> =>,\,>|]>>>|)>\\>|)>> Return > As for symmetric polynomials, the inverse algorithm >> is obtained by reversing the loops and inverting the butterflies of .\ Let us analyze the cost of this algorithm. It starts with > stages of classical -dimensional FFT, computing only the butterflies whose vertices are in >. This amounts to *n*\*\\|\|>> arithmetic operations in >. The second part is made of the transformations of Equation and Proposition. Since we consider that operations with the lattice > take time >, these transformations take time \\|\|>|)>>. Finally, the componentwise multiplication of |]>/>-1,\,y>-1|)>|)>>>>> costs >*,\,\|)>> where |)>> stands for the arithmetic complexity of polynomials multiplication in |]>>>. Altogether, our multiplication algorithms costs <\equation*> |)>\3/2*n*\*\\|\|>+2>*|)>+\\\|\|>|)>. By contrast, a symmetry-oblivious approach would consist in doing > stages of classical -dimensional FFT and then >> multiplications in |]>/>-1,\,x>-1|)>>. The cost analysis is similar to the one before and the classical approach's cost is |)>\3/2*n*\*d+2>*|)>+\|)>>. The ratio /\\|\|>> is exactly the volume |)>> of the lattice, defined by |)>\det|)>\\>. Under the superlinearity assumption for the function >, we get that |)>*|)>\|)>> and we deduce the following theorem. <\theorem> For a fixed lattice > and \>, the direct ( inverse) lattice FFT can be computed in time <\eqnarray*> |)>>|||)>>*|)>+\|)>.>>>> To conclude this paper, we briefly discuss the extensions of the previous results to the Truncated Fourier Transform (TFT). With the notation of Section, let \> be an initial segment subset of >: for ,\\\>, if \I and \\\\, then \\I>. Given \|]>>, the TFT of is defined to be the vector \\> defined by >=P|]>>>|)>>, where > is a root of unity of order . In, van der Hoeven described fast algorithms for computing the TFT and its inverse. For afixed dimension , the cost of these algorithms is bounded by *log +d|)>> instead of *log d|)>>. In this section we will outline how a further acceleration can be achieved for symmetric polynomials of the types studied in Sections and. For each \,2>|}>>, let |]>>=\\:\\I,\\\>|}>>. The entire computation of the TFT and its inverse can be represented schematically by a graph >. The vertices of the graph are pairs |)>> with ,\|}>> and \I-s>|]>>>. The edges are between vertices |)>> and |)>> with =\\-s-1>*\|)>> and \\>. The edge is labeled by a constant that we will simply write ,\>> such that <\eqnarray*> >>||>c,\>*P>.>>>> For a direct TFT, we are given >> on input and ``fill out'' the remaining values >> for increasing values of using(). In the case of an inverse TFT, we are given the >>> with \I> on input, as well as the coefficients >=0> with \\\I>. We next ``fill out'' the remaining values >> using the special algorithm described in. Now let be as in Section and assume that is stable under . Then each of the |]>>> is also stable under . Furthermore, any \I> lies in the orbit of some element of F> under the action of . Let F|)>|]>>=\\:\\I\F,\\\>|}>.> Given a -symmetric input polynomial on input, the idea behind the symmetric TFT is to use the restriction > of the above graph > by keeping only those vertices |)>> such that \I|)>-s>|]>>>. The symmetric TFT and its inverse can then be computed by intertwining the above ``filling out'' process with steps in which we compute |)>>=P>> for all \\> and G> such that >> is known but not |)>>>. The computational complexities of the symmetric TFT and its inverse are both proportional to the size |\|>> of >. For many initial domains of interest, it can be shown that |\|>\|\|>/>, as soon as gets large. This is for instance the case for =\\:i+\+i\k|}>>, when \>, and where =|log k|\>>. Indeed, using similar techniques as in the proof of Theorem, we first show that \F|)>|]>>-|)>|]>>/|\|>=\*d|)>>, and then conclude in a similar way. Let us now consider the case of a polynomial \|]>\I>>, with > as in Section. In order to design a fast ``lattice TFT'', the idea is simply to replace by a slightly larger set which preserves the fundamental domains. More precisely, with ,\,p> as in Section, we take <\eqnarray*> ||*i+j,\,p*i+j|)>:\\K,\\\>|}>>>||||i/p|\>,\,|i/p|\>|)>:\\I|}>.>>>> A lattice TFT of order *2>,\,p*2>|)>> can then be regarded as \\>|\|>> TFTs at order >,\,2>|)>> and initial segment , followed by > TFTs lattice FFTs on a fundamental domain. Asymptotically speaking, we thus gain a factor >|\|>/\\>|\|>> with respect to a usual TFT with initial segment . In the case when =\\:i+\+i\k|}>>, we finally notice that \> for \>. Let us quickly outline possible generalizations of the results of this paper. It seems that the most general kinds of finite groups for which the techniques in this paper work are finite subgroups of *U>, where > is the group of n> permutation matrices and > the group of diagonal matrices whose entries are all roots of unity. Indeed, any such group both acts on the sets |]>> and on the torus >, where =*i/n>:i,n\\|}>>. Even more generally, the results may still hold for closed algebraic subgroups generated by an infinite number of elements of *U>. Many other interesting groups can be obtained as conjugates =T*G*T> of groups of the above kind. For certain applications, such as the integration of dynamical systems, the change of coordinates can be done ``once and for all'' on the initial differential equations, after which the results of this paper again apply. It is classical that the FFT of a polynomial with real coefficients can be computed twice as fast (roughly speaking) as a polynomial with complex coefficient. Real polynomials can be considered as symmetric polynomials for complex conjugation: >|)>=|\>>. Some further extensions of our setting are possible by including this kind of symmetries. On some simple examples, we have verified that the ideas of this paper generalize to other evaluation-interpolation models for polynomial multiplication, such as Karatsuba multiplication and Toom-Cook multiplication. We intend to study the above generalizations in more detail in a forthcoming paper. <\small> <\bibliography|bib|tm-plain|all.bib> <\bib-list|33> L.Auslander, J. R.JohnsonR. W.Johnson. An equivariant Fast Fourier Transform algorithm. Drexel University Technical Report DU-MCS-96-02, 1996. M.Ben-OrP.Tiwari. A deterministic algorithm for sparse multivariate polynomial interpolation. , 301--309. New York, NY, USA, 1988. ACM Press. R.Bergmann. The fast Fourier transform and fast wavelet transform for patterns on the torus. , , 2012. In Press. P.Bürgisser, M.ClausenM. A.Shokrollahi. . Springer-Verlag, 1997. J.Canny, E.KaltofenY.Lakshman. Solving systems of non-linear polynomial equations faster. , 121--128. Portland, Oregon, A.C.M., New York, 1989. ACM Press. D.G.CantorE.Kaltofen. On fast multiplication of polynomials over arbitrary algebras. , 28:693--701, 1991. J.W.CooleyJ.W.Tukey. An algorithm for the machine calculation of complex Fourier series. , 19:297--301, 1965. S.Czapor, K.GeddesG.Labahn. . Kluwer Academic Publishers, 1992. P.DuhamelM.Vetterli. Fast Fourier transforms: a tutorial review and a state of the art. , 19(4):259--299, apr 1990. T. S.Freeman, G. M.Imirzian, E.KaltofenY.Lakshman. DAGWOOD a system for manipulating polynomials given bystraight-line programs. , 14:218--240, 1988. S.GargÉ.Schost. Interpolation of polynomials given by straight-line programs. , 410(27-29):2659--2662, 2009. M.GastineauJ.Laskar. 1.2.26. TRIP Reference manual, IMCCE, 2012. <\with|font-family|tt> http://www.imcce.fr/trip/ . A.GuessoumR.Mersereau. Fast algorithms for the multidimensional discrete Fourier transform. , 34(4):937--943, 1986. J. von zurGathenJ.Gerhard. . Cambridge University Press, 2-nd, 2002. J. van derHoeven. Relax, but don't be too lazy. , 34:479--542, 2002. J. van derHoeven. The truncated Fourier transform and applications. J.Gutierrez, , 290--296. Univ. of Cantabria, Santander, Spain, July 4--7 2004. J. van derHoeven. Notes on the Truncated Fourier Transform. 2005-5, Université Paris-Sud, Orsay, France, 2005. J. van derHoeven. Newton's method and FFT trading. , 45(8):857--878, 2010. J. van derHoevenG.Lecerf. On the bit-complexity of sparse polynomial multiplication. , HAL, 2010. , accepted for publication in JSC. J. van derHoevenG.Lecerf. On the complexity of blockwise polynomial multiplication. , 211--218. Grenoble, France, July 2012. J. van derHoevenÉ.Schost. Multi-point evaluation in higher dimensions. , HAL, 2010. , accepted for publication in AAECC. J.JohnsonX.Xu. Generating symmetric DFTs and equivariant FFT algorithms. , 195--202. ACM, 2007. S. C.Johnson. Sparse polynomial arithmetic. , 8(3):63--71, 1974. E.KaltofenY. N.Lakshman. Improved sparse multivariate polynomial interpolation algorithms. , 467--474. Springer Verlag, 1988. E.Kaltofen, Y. N.LakshmanJ.-M.Wiley. Modular rational sparse multivariate polynomial interpolation. , 135--139. New York, NY, USA, 1990. ACM Press. E.Kaltofen, W.LeeA. A.Lobo. Early termination in Ben-Or/Tiwari sparse interpolation and a hybrid of Zippel's algorithm. , 192--201. New York, NY, USA, 2000. ACM Press. AKudlicki, M.RowickaZ.Otwinowski. The crystallographic Fast Fourier Transform. recursive symmetry reduction. , A63:465--480, 2007. Victor Y.Pan. Simple multivariate polynomial multiplication. , 18(3):183--186, 1994. V.PanD.Bini. . Birkhauser, 1994. A.SchönhageV.Strassen. Schnelle Multiplikation grosser Zahlen. , 7:281--292, 1971. D. R.Stoutemyer. Which polynomial representation is best? , 221--243. 1984. L. F.Ten Eyck. Crystallographic Fast Fourier Transform. , A29:183--191, 1973. A.VinceX.Zheng. Computing the Discrete Fourier Transform on a hexagonal lattice. , 28:125--133, 2007. <\initial> <\collection> <\references> <\collection> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > <\auxiliary> <\collection> <\associate|bib> Johnson1974 Stoutemyer1984 CzaporGeddesLabahn1992 BuClSh1997 FrImKaLa88 CT65 SS71 CK91 Pan94 vdH:tft PB94 GaGe02 BenOrTiwari1988 KaltofenYagati1988 KaltofenLakshmanWiley1990 KaltofenLeeLobo2000 GargSchost2009 CKL89 vdH:sparsemult vdH:relax vdH:tft-note vdH:fnewton vdH:mmeval vdH:blockprod GaLa11 TenEyck73 KuRoOt07 AsJoJo96 JoXu07 GuMe86 ViZh07 Bergmann12 DuVe90 GaGe02 AsJoJo96 JoXu07 vdH:tft-note GuMe86 ViZh07 vdH:tft vdH:tft-note vdH:tft-note <\associate|figure> Classical bivariate FFT|> Symmetric bivariate FFT|> <\associate|toc> |math-font-series||Abstract> |.>>>>|> |math-font-series||1.Introduction> |.>>>>|> |math-font-series||2.The classical FFT> |.>>>>|> |2.1Notation |.>>>>|> > |2.2The classical multivariate FFT |.>>>>|> > |math-font-series||3.The symmetric FFT> |.>>>>|> |3.1The bivariate case |.>>>>|> > |3.2Notation |.>>>>|> > |3.3Fundamental domains |.>>>>|> > |3.4The symmetric FFT |.>>>>|> > |math-font-series||4.The lattice FFT> |.>>>>|> |4.1Lattice polynomials |.>>>>|> > |4.2The lattice FFT on a basic domain |.>>>>|> > |4.3The general lattice FFT multiplication |.>>>>|> > |math-font-series||5.The Symmetric TFT> |.>>>>|> |5.1The symmetric TFT |.>>>>|> > |5.2The lattice TFT |.>>>>|> > |math-font-series||6.conclusion> |.>>>>|> |math-font-series||7.References> |.>>>>|>