> <\body> <\hide-preamble> >>>> >>>> >>>> \>> >>>> >>>> \>>> \>>> |>>|>>>>>>> ||<\author-address> LIX, CNRS École polytechnique 91128 Palaiseau Cedex France ||>|<\doc-note> This work has been supported by the ANR-09-JCJC-0098-01 project, as well as the Digiteo 2009-36HD grant and Région Ile-de-France. |>|> <\abstract> The goal of the project is to develop a new and free software for computer algebra and computer analysis, based on a strongly typed and compiled language. In this paper, we focus on the underlying type system of this language, which allows for heavy overloading, including parameterized overloading with parameters in so called ``categories''. The exposition is informal and aims at giving the reader an overview of the main concepts, ideas and differences with existing languages. In aforthcoming paper, we intend to describe the formal semantics of the type system in more details. >>> Until the mid nineties, the development of computer algebra systems tended to exploit advances in the area of programming languages, and sometimes even influenced the design of new languages. The system was developed shortly after the introduction of . Symbolic algebra was an important branch of the artificial intelligence project at during the sixties. During a while, the system was the largest program written in , and motivated the development of better compilers. The system was at the origin of yet another interesting family of computer algebra systems, especially after the introduction of domains and categories as function values and dependent types in II>. These developments were at the forefront of language design and type theory. later evolved into the system. In the project, later renamed into , the language and compiler were redesigned from scratch and further purified. After this initial period, computer algebra systems have been less keen on exploiting new ideas in language design. One important reason is that a good language for computer algebra is more important for developers than for end users. Indeed, typical end users tend to use computer algebra systems as enhanced pocket calculators, and rarely write programs of substantial complexity themselves. Another reason is specific to the family of systems that grew out of : after IBM's decision to no longer support the development, there has been a long period of uncertainty for developers and users on how the system would evolve. This has discouraged many of the programmers who did care about the novel programming language concepts in these systems. In our opinion, this has lead to an unpleasant current situation in computer algebra: there is a dramatic lack of a modern, sound and fast programming language. The major systems and are both interpreted, weakly typed (even the classical concept of a closure has been introduced only recently in !), besides being proprietary and very expensive. The system relies on and merely contents itself to glue together various existing libraries and other software components. The absence of modern languages for computer algebra is even more critical whenever performance is required. Nowadays, many important computer algebra libraries (such as , , , , etc.) are directly written in C or C++. Performance issues are also important whenever computer algebra is used in combination with numerical algorithms. We would like to emphasize that high level ideas can be important even for traditionally low level applications. For instance, in a suitable high level language it should be easy to operate on vectors of, say, bit floating point numbers. Unfortunately, would have to be completely redesigned in order to make such a thing possible. For these reasons, we have started the design of a new and free software, , based on acompiled and strongly typed language, featuring signatures, dependent types, and overloading. Although the design has greatly been influenced by and its successors and , there are several important differences, as we will see. In this paper, we will focus on the underlying type system. We present an informal overview of this system and highlight in which respect it differs from existing systems. We plan to provide a more detailed formal description of the type system in a future paper. The central idea behind the design of the language is that the declaration of a function is analogous to the statement of a mathematical theorem, whereas the implementation of the function is analogous to giving a proof. Of course, this idea is also central in the area of automated proof assistants, such as or . However, only insists on very detailed declarations, whereas the actual implementations do need not to be formally proven. One consequence of this design philosophy is that program interfaces admit very detailed specifications: although the actual implementations are not formally proven, the combination of various components is sound as long as each of the components fulfills its specification. By contrast, , or functions can only be specified in quite vague manners, thereby introducing a big risk of errors when combining several libraries. Another consequence of the design is that it allows for massively overloaded notations. This point is crucial for computer algebra and also the main reason why mainstream strongly typed functional programming languages, such as or , are not fully suitable for our applications. To go short, we insist on very detailed and unambiguous function declarations, but provide a lot of flexibility at the level of function applications. On the contrary, languages such as require unambiguous function applications, but excel at assigning types to function declarations in which no types are specified for the arguments. The type system also allows for a very flat design of large libraries: every function comes with the hypotheses under which it is correct, and can almost be regarded as a module on its own. This is a major difference with respect to and , where functionality is usually part of a class or a module. In such more hierarchical systems, it is not always clear where to put a given function. For instance, should a converter between lists and vectors be part of the list or the vector class? In order to make the above discussion about the main design philosophy more concrete, we will consider the very simple example of computing the square of an element in amonoid. In section, we will show how such a function would typically be written in various existing languages, and compare with what we would do in . In section, we will continue with a more complete description of the primitives of the type system which have currently been implemented in the compiler. We will also discuss various difficulties that we have encountered and some plans for extensions. As stated before, we have chosen to remain quite informal in this paper. Nevertheless, in section, we will outline the formal semantics of the type system. The main difficulty is to carefully explain what are the possible meanings of expressions based on heavily overloaded notations, and to design a compiler which can determine these meanings automatically. Given the declarative power of the language, it should be noticed that the compiler will not always be able to find all possible meanings of a program. However, this is not necessarily a problem as long as the compiler never assigns a wrong meaning to an expression. Indeed, given an expression whose meanings are particularly hard to detect, it is not absurd to raise an error, or even to loop forever. Indeed, in such cases, it will always be possible to make the expression easier to understand by adding some explicit casts. Fortunately, most natural mathematical notations also have asemantics which is usually easy to determine: otherwise, mathematicians would have ahard job to understand each other at the first place! We will consider a very simple example in order to illustrate the most essential differences between and various existing programming languages: the computation of the square of an element of a monoid. Here we recall that a monoid is simply a set together with an associative multiplication :M\M>. In , we would start by a formal declaration of the concept of a monoid. As in the case of , this is done by introducing the monoid : <\mmx-code> category Monoid == { \ \ infix *: (This, This) -\ This; } We may now define the square of an element of a monoid by <\mmx-code> forall (M: Monoid) square (x: M): M == x * x; Given an instance of any type with a multiplication *: (T, T) -\ T>, we may then compute the square of using . In our definition of a monoid, we notice that we did specify the multiplication to be associative. Nothing withholds the user from replacing the definition by <\mmx-code> category Monoid == { \ \ infix *: (This, This) -\ This; \ \ associative: This -\ Void; } This allows the user to that the multiplication on atype is associative, by implementing the ``dummy'' function for . At this point, the type system provides no means for specifying the mathematical semantics of associativity: if we would like to take advantage out of this kind of semantics, then we should also be able to automatically associativity during type conversions. This is really an issue for automatic theorem provers which is beyond the current design goals of . Notice nevertheless that it would be quite natural to extend the language in this direction in the further future. As stated in the introduction, a lot of the inspiration for comes from the system and its predecessors. In , the category would be defined using <\mmx-code> : Category == with { \ \ *: (%, %) -\ %; } However, the primitive inside for the definition of templates does not have an analogue inside . In , one would rather define a parameterized class which exports the template. For instance: <\mmx-code> (M: Monoid): with { \ \ square: M -\ M; } == { \ \ square (x: M): M == x * x; } In order to use the template for a particular class, say , one has to explicitly import the instantiation of the template for that particular class: <\mmx-code> import from Squarer (Integer); The necessity to encapsulate templates inside classes makes the class hierarchy in rather rigid. It also forces the user to think more than necessary about where to put various functions and templates. This is in particular the case for routines which involve various types in a natural way. For instance, where should we put a converter from vectors to lists? Together with other routines on vectors? With other routines on lists? Or in an entirely separate module? The C++ language does provide support for the definition of templates: <\cpp-code> template\typename M\ square (const M& x) { \ \ return x * x; } However, as we see on this example, the current language does not provide a means for requiring to be a monoid. Several C++ extensions with ``signatures'' or ``concepts'' have been proposed in order to add such requirements. also imposes a lot of restrictions on how templates can be used. Most importantly, the template arguments should be known statically, at compile time. Also, instances of user defined types cannot be used as template arguments. In the above piece of code, we also notice that the argument is of type M&> instead of. This kind of interference of low level details with the type system is at the source of many problems when writing large computer algebras libraries in . Although also gives access to various low level details, we decided to follow a quite different strategy in order to achieve this goal. However, these considerations fall outside the main scope of this paper. Mainstream strongly typed functional programming languages, such as and , do not provide direct support for operator overloading. Let us first examine the consequences of this point of our view in the case of . First of all, multiplication does not carry the same name for different numeric types. For instance: <\verbatim-code> let square x = x * x;; int> = \fun\ \; let float_square x = x *. x;; float> = \fun\ At any rate, this means that we somehow have to specify the monoid in which we want to take a square when applying the square function of our example. Nevertheless, modulo acceptance of this additional disambiguation constraint, it is possible to define the analogue of the category and the routine : <\verbatim-code> module type Monoid = \ \ \ \ sig \ \ \ \ \ \ type t \ \ \ \ \ \ val mul : t -\ t -\ t \ \ \ \ end;; \; module Squarer = \ \ functor (El: Monoid) -\ \ \ \ \ struct \ \ \ \ \ \ let square x = El.mul x x \ \ \ \ end;; As in the case of , we need to encapsulate the function in a special module . Moreover, additional efforts are required in order to instantiate this module for a specific type, such as : <\verbatim-code> module int_Monoid = \ \ \ \ struct \ \ \ \ \ \ type t = int \ \ \ \ \ \ let mul x y = x * y \ \ \ \ end;; \; module int_Squarer = Squarer (int_Monoid);; \; int_Squarer.square 11111;; = 123454321 is similar in spirit to , but type classes allow for a more compact formulation. The square function would be defined as follows: <\verbatim-code> a \ \ \ \ (*) :: a -\ a -\ a \; x = x * x In order to enable the square function for a particular type, one has to create an instance of the monoid for this particular type. For instance, we may endow with the structure of a monoid by using concatenation as our multiplication: <\verbatim-code> a \ \ \ \ x * y :: x ++ y After this instantiation, we may square the string using "hello">. In order to run this example in practice, we notice that there are a few minor problems: the operator is already reserved for standard multiplication of numbers, and one has to use the XFlexibleInstances> option in order to allow for the instantiation of the string type. The nice thing of the above mechanism is that we may instantiate other types as monoids as well and share the name of the multiplication operator among all these instantiations. style polymorphism thereby comes very close to operator overloading. However, there are some important differences. First of all, it is not allowed to use the same name inside another type class, such as , except when the other type class is explicitly derived from . Secondly, the user still has to explicitly instantiate the type classes for specific types: in , the type can automatically be regarded as a as soon as the operator is defined on strings. Although the automated proof assistant is not really a programming language, it implements an interesting facility for implicit conversions. This facility is not present in , on which is based. Besides elementary implicit conversions, such as the inclusion of > in >, it is also possible to define parametric implicit conversions, such as the inclusion of > in >. Moreover, implicit conversions can be composed into chains of implicit conversions. One major problem with implicit conversions is that they quickly tend to become ambiguous (and this explains why they were entirely removed from and ; see section below). For example, the implicit conversion \\> may typically be achieved in two different ways as the succession of two implicit conversions; namely, as \\\\> or as \\\\>. In , such ambiguities are resolved by privileging the implicit conversions which are declared first. It is interesting to notice that this makes sense in an automated proof assistant: the of the final proof does not really depend on the way how we performed implicit conversions. However, such ambiguities are more problematic for general purpose programming languages: the conversion \\\\> is usually more than \\\\>, so we really want to enforce the most efficient solution. Currently, does not provide genuine support for overloading. If such support existed, then a similar discussion would probably apply. Essentially, the difference between and classical strongly typed functional languages such as and is explained by the following observation: if we want to be able to declare the square function simply by writing <\verbatim-code> x = x * x and specifying the type of , then the symbol should not be too heavily overloaded in order to allow the type system to determine the type of . In other words, no sound strongly typed system can be designed which allows both for highly ambiguous function declarations highly ambiguous function applications. Whether the user prefers a type system which allows for more freedom at the level of function declarations or function applications is a matter of personal taste. We may regard and as prominent members of the family of strongly typed languages which accomodate a large amount of flexibility at the declaration side. But if we are rather looking for high expressiveness at the function application side, and insist on the possibility to heavily overload notations, then we hope that the type system will be aconvenient choice. We finally notice that signatures are now implemented under various names in a variety of languages. For instance, in , one may use the concept of an interface. Nevertheless, to the best of our knowledge, the current section describes the main lines along which signatures are conceived in current languages. There are three main kinds of objects inside the type system: ordinary variables (including functions), classes and categories. Ordinary variables are defined using the following syntax: <\mmx-code> test?: Boolean == pred? x; \ // constant flag?: Boolean := false; \ \ \ // mutable In this example, is a constant, whereas is a mutable variable which can be given new values using the assignment operator . The actual type of the mutable variable is . Functions can be declared using a similar syntax: <\mmx-code> foo (x: Int): Int == x * x; is a fully functional language, so that functions can both be used as arguments and as return values: <\mmx-code> shift (x: Int) (y: Int): Int == x + y; (foo: Int -\ Int, n: Int) \ \ \ \ \ \ \ \ (x: Int): Int == \ \ if n = 0 then x \ \ else iterate (foo, n-1) (foo x); The return type and the types of part of the function arguments are allowed to depend on the arguments themselves. For instance: <\mmx-code> square (x: M, M: Monoid): M == x * x; does not allow for mutually dependent arguments, but dependent arguments can be specified in an arbitrary order. New classes are defined using the primitive, as in the following example: <\mmx-code> class Point == { \ \ mutable { \ \ \ \ x: Double; \ \ \ \ y: Double; \ \ } \ \ constructor point (x2: Int, y2: Int) == { \ \ \ \ x == x2; \ \ \ \ y == y2; \ \ } } We usually use similar names for constructors as for the class itself, but the user is free to pick other names. The keyword specifies that we have both read and read-write accessors .x> and .y> for the fields and. Contrary to C++, new accessors can be defined outside the class itself: <\mmx-code> postfix .length (p: Point): Double == \ \ sqrt (p.x * p.x + p.y * p.y); As in the case of functions, classes are allowed to depend on parameters, which may be either type parameters or ordinary values. Again, there may be dependencies among the parameters. One simple example of a class definition with parameters is: <\mmx-code> class Num_Vec (n: Int) == { \ \ mutable v: Vector Double; \ \ constructor num_vec (c: Double) == { \ \ \ \ v == [ c \| i: Int in 0..n ]; \ \ } } Categories are the central concept for achieving genericity. We have already seen an example of the definition of a category in section. Again, categories may take parameters, with possible dependencies among them. For instance: <\mmx-code> category Module (R: Ring) == { \ \ This: Abelian_Group; \ \ infix *: (R, This) -\ This; } As in or , the type can occur in the category fields in many ways. In the above example, the line means that in particular includes all fields of . More generally, can be part of function argument types, of return types, or part of the declaration of an ordinary variable. For instance, the category T> below formalizes the concept of a type with an implicit converter to. <\mmx-code> category Type {} \; category To (T: Type) == { \ \ convert: This -\ T; } Given an ordinary type , we write if is an instance of. In the case of a category , we write > if a type the category, that is, if all category fields are defined in the current context, when replacing by . Contrary to or , it follows that is very name sensitive: if we want a type to be a monoid, then we need a multiplication on with the exact name . Of course, wrappers can easily be defined if we want different names, but one of the design goals of is that it should be particularly easy to consistently use standard names. The main strength of the type system is that it allows for heavy though fully type safe overloading. Similarly as in or , discrete overloading of a symbol is achieved by declaring it several times with different types: <\mmx-code> infix * (c: Double, p: Point): Point == \ \ point (c * p.x, c * p.y); infix * (p: Point, c: Double): Point == \ \ point (p.x * c, p.y * c); Contrary to , non function variables and return values of functions can also be overloaded: <\mmx-code> bar: Int == 11111; bar: String == "Hello"; mmout \\ bar * bar \\ lf; mmout \\ bar \\ " John!" \\ lf; Internally, the type system associates a special type to the overloaded variable . During function applications, consistently takes into account all possible meanings of the arguments and returns a possibly overloaded value which corresponds to all possible meanings of the function application. For instance, consider the overloaded function <\mmx-code> foo (x: Int): Int == x + x; foo (s: String): String == reverse s; Then the expression bar> will be assigned the type (Int, String)>. An example of a truly ambiguous expression would be =bar>, since it is unclear whether we want to compare the integers or the strings . True ambiguities will provoke compile time errors. The second kind of parametric overloading relies on the keyword. The syntax is similar to template declarations in, with the difference that all template parameters should be rigourously typed: <\mmx-code> forall (M: Monoid) fourth_power (x: M): M == x * x * x * x; Internally, the type system associates a special type M)> to the overloaded function . In a similar way, values themselves can be parametrically overloaded. The main challenge for the type system is to compute consistently with intersection types and universally quantified types. For instance, we may define the notation for vectors using <\mmx-code> forall (T: Type) operator [] (t: Tuple T): Vector T == vector t; This notation in particular defines the empty vector which admits the universally quantified type . In particular, and contrary to what would have been the case in , it is not necessary to make the type of explicit as soon as we perform the template instantiation. Thus, writing <\mmx-code> v: Vector Int == []; w: Vector Int == [] \\ []; // concatenation would typically be all right. On the other hand, the expression (size of the empty vector) is an example of a genuine and parametric ambiguity. In comparison with , it should be noticed in addition that parametric overloading is fully dynamic and that there are no restrictions on the use of ordinary variables as template parameters. Again, there may be dependencies between template arguments. also implements the mechanism of partial specialization. For instance, if we have a fast routine for double precision numbers, then we may define <\mmx-code> fourth_power (x: Double): Double == \ \ square square x; Contrary to , partial specialization of a function takes into account both the argument types the return type. This make it more natural to use the partial specialization mechanism for functions for which not all template parameters occur in the argument types: <\mmx-code> forall (R: Number_Type) pi (): R == ...; pi (): Double == ...; One major difference between and is that does contain any mechanism for implicit conversions. Indeed, in , the mechanism of implicit conversions partially depends on heuristics, which makes its behaviour quite unpredictable in non trivial situations. We have done a lot of experimentation with the introduction of implicit conversions in the type system, and decided to ban them from the core language. Indeed, systematic implicit conversions introduce too many kinds of ambiguities, which are sometimes of a very subtle nature. Nevertheless, the parametric overloading facility makes it easy to implicit conversions, with the additional benefit that it can be made precise when exactly implicit conversions are permitted. Indeed, we have already introduced the category, definedby <\mmx-code> category To (T: Type) == { \ \ convert: This -\ T; } Here is the standard operator for type conversions in . Using this category, we may define scalar multiplication for vectors by <\mmx-code> forall (M: Monoid, C: To M) infix * (c: C, v: Vector M): Vector M == \ \ [ (c :\ M) * x \| x: M in v ]; Here M> stands for the application of to and retaining only the results of type (recall that might have several meanings due to overloading). This kind of emulated ``implicit'' conversions are so common that defines a special notation for them: <\mmx-code> forall (M: Monoid) infix * (c :\ M, v: Vector M): Vector M == \ \ [ c * x \| x: M in v ]; In particular, this mechanism can be used in order to define converters with various kinds of transitivity: <\mmx-code> convert (x :\ Integer): Rational == x / 1; convert (cp: Colored_Point) :\ Point == cp.p; The first example is also called an and provides asimple way for the construction of instances of more complex types from instances of simpler types. The second example is called a and can be used in order to customize type inheritance, in a way which is unrelated to the actual representation types in memory. The elimination of genuine implicit converters also allows for several optimizations in the compiler. Indeed, certain operations such as multiplication can be overloaded hundreds of times in non trivial applications. In the above example of scalar multiplication, the compiler takes advantage of the fact that at least one of the two arguments must really be a vector. This is done using a special table lookup mechanism for retaining only those few overloaded values which really have a chance of succeeding when applying afunction to concrete arguments. Functions with several arguments use a classical tuple notation. It would have been possible to follow the and conventions, which rely on currying, and rather regard a binary function \T> as a function of type T|)>>>. Although this convention is more systematic and eases the implementation of a compiler, it is also non standard in mainstream mathematics; in , we have chosen to keep syntax as close as possible to classical mathematics. Furthermore, currying may be a source of ambiguities in combination with overloading. For instance, the expression 1> might be interpreted as the unary negation applied to , or as the operator 1-x>. In order to accomodate for functions with an arbitrary number of arguments and lazy streams of arguments, uses a limited amount of syntactic sugar. Given atype , the type T> stands for an arbitrary tuple of arguments of type , and T> stands for a lazy stream of arguments of type. For instance, would be a typical tuple of type Int> and a typical generator of type Int>. For instance, the prototype of a function which evaluates a multivariate polynomial at atuple of points might be <\mmx-code> forall (R: Ring) eval (P: MVPol R, p: Tuple R): R == ...; The syntactic sugar takes care of the necessary conversions between tuples and generators. For instance, given a polynomial , the following would be valid evaluations: <\mmx-code> eval (P, 1, 2..8, (9, 10), 11..20); eval (P, (i^2 \| i: Int in 0..100)); Notice that the notation of function application (or evaluation) can be overloaded itself: <\mmx-code> postfix .() (fs: Vector (Int -\ Int), \ \ \ \ \ \ \ \ \ \ \ \ \ x: Int): Vector Int == \ \ [ f x \| f: Int -\ Int in fs ]; There are various natural and planned extensions of the current type system. One of the most annoying problems that we are currently working on concerns literal integers: the expression can naturally be interpreted as a machine or as a long . Consequently, it is natural to consider to be of type . For efficiency reasons, it is also natural to implement each of the following operations: <\mmx-code> infix =: (Int, Int) -\ Boolean; infix =: (Integer, Integer) -\ Boolean; infix =: (Int, Integer) -\ Boolean; infix =: (Integer, Int) -\ Boolean; This makes an expression such as =1> highly ambiguous. Our current solution permits the user to prefer certain operations or types over others. For instance, we would typically prefer the type over , since arithmetic might overflow. However, we still might prefer Boolean> over -\ Boolean>. Indeed, given , we would like the test =0> to be executed fast. One rather straightforward extension of the type system is to consider other ``logical types''. Logical implication is already implemented using the primitive: <\mmx-code> forall (R: Ring) { \ \ ... \ \ assume (R: Ordered) \ \ sign (P: Polynomial R): Int == \ \ \ \ if P = 0 then 0 else sign P[deg P]; \ \ ... } The implementation of types will allow us to write routines such as <\mmx-code> forall (K: Field) exists (L: Algebraic_Extension K) roots (p: Polynomial K): Vector L == ...; Similarly, we plan the implementation of union types and abstract data types, together with various pattern matching utilities similar to those found in and . We also plan to extend the syntactic sugar. For instance, given two aliases , we would like to be able to write or . Amacro facility should also be included, comparable to the one that can be found in . Some further syntactic features might be added for specific areas. For instance, in the system, one may use the declaration <\verbatim-code> R = ZZ[x,y] for the simultaneous introduction of the polynomial ring > the two coordinate functions >. In the longer future, we would like to be able to formally describe mathematical properties of categories and algorithms, and provide suitable language constructs for supplying partial or complete correctness proofs. In order to specify the semantics of the language, it is useful to forget about all syntactic sugar and schematize the language by remaining as close as possible to more conventional typed >-calculus. Source programs can be represented in asuitable variant of typed >-calculus, extended with special notations for categories and overloading. We will use a few notational conventions. For the sake of brevity, we will now use superscripts for specifying types. For instance, x>.x|)>>>> denotes the function \\x>. For the sake of readability, we will also denote types >, >, using capitalized identifiers and categories >, >, using bold capitalized identifiers. Similarly, we will use the terms ``type expressions'' and ``category expressions'' whenever an expression should be considered as a type or category. Notice however that this terminology is not formally enforced by the language itself. The contains three main components: The first component consists of ordinary typed >-expressions, and notations for their types: <\enumerate> Given expressions and , we denote function application by >, x>, or . Given a variable , an expression and type expressions > and >, we denote by x>.y>> the lambda expression which sends of type > to of type >. We will denote by \> the type of the above >-expression. In the case when > depends on , we will rather write \>> for this type. Hence, all lambda expressions are typed and there are no syntactic constraints on the types> and >. However, ``badly typed'' expressions such as x>.x>> will have no correct interpretation in the section below. The second part of the language concerns declarations of recursive functions, classes and categories. <\enumerate> Given variables ,\,x>, type expressions ,\,> and expressions ,\,y,z>, we may form the expression >\y,\,x>\y|)>\z>. The informal meaning is: the expression , with mutually recursive bindings >\y,\,x>\y>. Given variables ,\,x> and type expressions ,\,>, we may form the data type |x>,\,x>|\>>. For instance, a list of integers might be declared using \|>,\\>|\>|)>\z>. We also introduce a special variable > which will be the type of |x>,\,x>|\>>. Given variables ,\,x,y> and type expressions ,\,,>, we may form the category >|x>,\,x>|\>>. For instance, we might introduce the > category using \>|\\\>|\>|)>\z>. The last part of the language includes explicit constructs for overloaded expressions and their types: <\enumerate> Given two expressions and , we may form the overloaded expression y>. Given type expressions > and >, we may form the intersection type \>. Given a variable , a type expression > and an expression , we may form the parametrically overloaded expression >>y>. Given a variable , a type expression > and a type expression >, we may form the universally quantified type expression >>>. In the last two cases, the variable is often (but not necessarily) a type variable > and its type > a category >. The source language allows us to define an overloaded function such as <\eqnarray*> ||Int>\\>>>||>|x>.x|)>>|)>\x>.x|)>>|)>>>>> In a context where is of type >, it is the job of the compiler to recognize that > should be interpreted as a function of type \> in the expression >. In order to do so, we first extend the source language with a few additional constructs in order to disambiguate overloaded expressions. The extended language will be called the . In a given context >, we next specify when a source expression can be interpreted as a non ambiguous expression > in the target language. In that case, we will write \x\> and the expression > will always admit a unique type. For instance, for > as above, we introduce operators > and > for accessing the two possible meanings, so that <\eqnarray*> Int>\\>,1>|}>>|>|\\|)>.>>>> For increased clarity, we will freely annotate target expressions by their types when appropriate. For instance, we might have written |)>\>>|)>>> instead of |)>>. In the target language, the following notations will be used for disambiguating overloaded expressions: <\enumerate> Given an expression , we may form the expressions > and >. Given expressions and , we may form the expression >. Here should be regarded as a template and > as its specialization at . There are many rules for specifying how to interpret expressions. We list a few of them: <\equation*> \x\\>|\\x\\|)>>>>\x\>|)>\\y\>|)>|\\y|)>\\|)>U>>> <\equation*> \x\>>>|)>\\z\>|)>|\\x\|]>/y|]>>> <\equation*> \\|^>|)>\|\\|^>>|}>|\>\y\|)>\\\|^>|)>|\\x>.y>|)>\x|^>>.>>|)>|^>\|^>>> Here /y|]>> stands for the substitution of > for in >. The second kind of extensions in the target language concern notations for specifying how types match categories: <\enumerate> Given expressions >, ,\,f> and >, we may form the expression |f,\,f|\>\>. The informal meaning of this expression is ``the type considered as an instance of , through specification of the structure ,\,f>''. Given an expression >, we may form >>, meaning ``forget the category of >''. Given expressions and >, we may form the expression >, which allows us to cast to a type of the form =|f,\,f|\>\>. Given an expression , we may form >. In order to cast a given type >> to a given category =>|x>,\,x>|\>>, all fields of the category should admit an interpretation in the current context: <\equation*> i,\/|]>\|^>|)>\\x\|^>>|)>|\\\|,\,|\>\>. Assuming in addition that \y\>>, we also have \y\\|,\,|\>\|)>>. There are further rules for casting down. A target expression >> is said to be reduced if its type > is not of the form \>, >>>, or \> or \>. The task of the compiler is to recursively determine all reduced interpretations of all subexpressions of a source program. Since each subexpression may have several interpretations, we systematically try to represent the set of all possible reduced interpretations by a conjunction > of universally quantified expressions. In case of success, this target expression > will be the result of the compilation in the relevant context >, and we will write \x\>>. Let us illustrate this idea on two examples. With as in() and \>>, there are two reduced interpretations of |)>>: <\eqnarray*> ||Int>\\>,\>|}>>>||>||)>\\|)>|)>|)>>,>>>> <\eqnarray*> ||Int>\\>,\>|}>>>||>||)>\\|)>|)>|)>>.>>>> Hence, the result of the compilation of |)>> is given by <\eqnarray*> ||Int>\\>,\>|}>>>||>||)>\>|)>|)>|)>\\|)>|)>|)>|)>\>.>>>> In asimilar way, the result of compilation may be a parametrically overloaded expression: <\eqnarray*> >>\ >,1>|}>>|>|\>>>|]> >>.>>>> Sometimes, the result > of the compilation of is a conjunction which contains at least two expressions of the same type. In that case, is truly ambiguous, so the compiler should return an error message, unless we can somehow resolve the ambiguity. In order to do this, the idea is to define a partial preference relation > on target expressions and to keep only those expressions in the conjunction > which are maximal for this relation. For instance, assume that we have a function > of type >>\|)>\\> and the constant of type >. In section , we have seen that supports partial specialization. Now |)>> is a partial specialization of |)>>, but not the inverse. Consequently, we should strictly prefer |)>> over |)>>, and |)>> over |)>|]>|)>\>, where =|\\\>|\>\>. As indicated in section, we are currently investigating further extensions of the preference relation > user provided preference rules. In absence of universal quantification, the search process for all reduced interpretations can in principle be designed to be finite and complete. The most important implementation challenge for compilers therefore concerns universal quantification. The main idea behind the current implementation is that all pattern matching is done in two stages: at the first stage, we propose possible matches for free variables introduced during unification of quantified expressions. At asecond stage, we verify that the proposed matches satisfy the necessary categorical constraints, and we rerun the pattern matching routines for the actual matches. When proceeding this way, it is guaranteed that casts of a type to a category never involve free variables. Let us illustrate the idea on the simple example of computing a square. So assume that we have the function of type >>\> in our context, as well as a multiplication :\\>. In order to compile the expression >|)>>, the algorithm will attempt to match \\> with >>\> for some free variable >. At a first stage, we introduce a new free variable >> and match >\\>> against \\>. This check succeeds with the bindings >\> and \>, but without performing any type checking for these bindings. At asecond stage, we have to resolve the innermost binding >\> and cast > to >. This results in the correct proposal >\> for the free variable, where \|\\\>|\>\>. We finally rematch \> with \\> and find the return type \>. In practice the above idea works very well. Apart from more pathological theoretical problems that will be discussed below, the only practically important problem that we do not treat currently, is finding a ``smallest common supertype'' with respect to type conversions (see also ). For instance, let be a function of type ||\>>>> >>\\>. What should be the type of , where ||\>>>> and ||\>>>> are such that > and are different? Theoretically speaking, this should be the type >>>>, where > is the category >|\>,\>|\>>. However, the current pattern matching mechanism in the compiler will not find this type. It is easy to write programs which make the compiler fail or loop forever. For instance, given a context with the category |)>\>|\>|\>> and functions > and > of types >>|)>\> and |)>>>\>, the compilation of >|)>> will loop. Indeed, the compiler will successively search for converters \>, |)>\>, |)>|)>\>, etc. Currently, some safeguards have been integrated which will make the compiler abort with an error message when entering this kind of loops. The expressiveness of the type system actually makes it possible to encode any first order theory directly in the system. For instance, given a binary predicate and function symbols , the statement x,P,g|)>\P|)>,f|)>> might be encoded by the declaration of afunction>> of type >>>>>>|)>|)>\>>|)>\>, where =>|>>|)>\>|)>\>|\>>. These negative remarks are counterbalanced by the fact that the type system is not intended to prove mathematical theorems, but rather to make sense out of commonly used overloaded mathematical notations. \ It relies upon the shoulders of the user to use the type system in order to define such common notations and not misuse it in order to prove general first order statements. Since notations are intended to be easily understandable at the first place, they can usually be given a sense by following simple formal procedures. We believe that our type system is powerful enough to cover most standard notations in this sense. The above discussion shows that we do not aim completeness for the system. So what about soundness? The rules for interpretation are designed in such a way that all interpretations are necessarily correct. The only possible problems which can therefore occur are that the compiler loops forever or that it is not powerful enough to automatically find certain non trivial interpretations. We also notice that failure of the compiler to find the intended meaning does not necessarily mean that we will get an error message or that the compiler does not terminate. Indeed, theoretically speaking, we might obtain acorrect interpretation, even though the intended interpretation should be preferred. In particular, it is important to use the overloading facility in such a way that all possible interpretations are always correct, even though some of them may be preferred. Given an expression on which the compilation process succeeds, we finally have to show what it means to evaluate . So let > with \x\>> be the expression in the target language which is produced by the compiler. The target language has the property that it is quite easy to ``downgrade''> into an expression of classical untyped >-calculus. This reduces the evaluation semantics of to the one of this calculus. Some of the most prominent rules for rewriting > into a term of classical untyped >-calculus are the following: <\enumerate> Overloaded expressions y> are rewritten as pairs f.f x y>. The projections > and > are simply :\x.\y.x> and :\x.\y.y>. Template expressions >>y> are rewritten as >-expressions x.y>. Template instantiation > is rewritten into function application >. Instances |x>,\,x>|\>\> of categories are implemented as -tuples f. x \ x>. For instance, consider the template >>\x>.x|)>>>. After compilation, this template is transformed into the expression .\x. |)> x x>, where =\x.\.\x.x>. One of the aims of the actual compiler is to be compatible with existing libraries and template libraries. For this reason, the backend of really transforms expressions in the target language into programs instead of terms of untyped >-calculus. <\bibliography|bib|tm-plain|all.bib> <\bib-list|37> The Axiom computer algebra system. <\verbatim> http://wiki.axiom-developer.org/FrontPage . G.BaumgartnerV.F.Russo. Implementing signatures for C++. , 19(1):153--187, 1997. E.Bond, M.Auslander, S.Grisoff, R.Kenney, M.Myszewski, J.Sammet, R.TobeyS.Zilles. FORMAC an experimental formula manipulation compiler. , ACM '64, 112--101. New York, NY, USA, 1964. ACM. T.CoquandG.Huet. The calculus of constructions. , 76(2-3):95--120, 1988. T. Coquandetal. The Coq proof assistant. <\verbatim> http://coq.inria.fr/ , 1984. D.Eisenbud, D.R.Grayson, M.E.StillmanB.Sturmfels. . Springer-Verlag, London, UK, UK, 2002. J.-C.Faugère. FGb: A Library for Computing Gröbner Bases. KomeiFukuda, JorisHoeven, MichaelJoswigNobukiTakayama, , 6327, 84--87. Berlin, Heidelberg, September 2010. Springer Berlin / Heidelberg. K.Geddes, G.GonnetMaplesoft. Maple. <\verbatim> http://www.maplesoft.com/products/maple/ , 1980. J. Y.Girard. Une extension de l'interprétation de Gödel à l'analyse, et son application à l'élimination de coupures dans l'analyse et la théorie des types. J. E.Fenstad, , 63--92. North-Holland Publishing Co., 1971. T.Granlund et al. GMP, the GNU multiple precision arithmetic library. <\verbatim> http://www.swox.com/gmp , 1991. D.R.GraysonM.E.Stillman. Macaulay2, a software system for research in algebraic geometry. Available at . J.H.Griesmer, R.D.JenksD.Y.Y.Yun. . Computer Science Department monograph series. IBM Research Division, 1975. G.Hanrot, V.Lefèvre, K.RydeP.Zimmermann. MPFR, a C library for multiple-precision floating-point computations with exact rounding. <\verbatim> http://www.mpfr.org , 2000. W.Hart. An introduction to Flint. K.Fukuda, J. van derHoeven, M.JoswigN.Takayama, , 6327, 88--91. Springer, 2010. J. van derHoeven, G.Lecerf, B.Mourain etal. Mathemagix. 2002. <\with|font-family|tt> http://www.mathemagix.org . J. van derHoeven, G.Lecerf, B.Mourrain, P.Trébuchet, J.Berthomieu, D.DiattaA.Manzaflaris. Mathemagix, the quest of modularity and efficiency for symbolic and certified numeric computation. , 45(3/4):186--188, 2012. P.Hudak, J.Hughes, S. PeytonJonesP.Wadler. A history of haskell: being lazy with class. , HOPL III, 12--1. New York, NY, USA, 2007. ACM. R. D.Jenks. The SCRATCHPAD language. , 9(4):101--111, mar 1974. R.D.Jenks. Modlisp -- an introduction (invited). , EUROSAM '79, 466--480. London, UK, UK, 1979. Springer-Verlag. R.D.JenksB.M.Trager. A language for computational algebra. , 16(11):22--29, 1981. R.D.JenksR.Sutor. . Springer-Verlag, New York, NY, USA, 1992. X.Leroy etal. OCaml. <\verbatim> http://caml.inria.fr/ocaml/ , 1996. W. A.MartinR. J.Fateman. The MACSYMA system. , SYMSAC '71, 59--75. New York, NY, USA, 1971. ACM. P.Martin-Löf. Constructive mathematics and computer programming. , :153--175, 1979. R.Milner. A theory of type polymorphism in programming. , 17:348--375, 1978. J.Moses. Macsyma: A personal history. , 47(2):123--130, 2012. The Maxima computer algebra system (free version). <\verbatim> http://maxima.sourceforge.net/ , 1998. T.Nipkow, L.PaulsonM.Wenzel. Isabelle/Hol. <\verbatim> http://www.cl.cam.ac.uk/research/hvg/Isabelle/ , 1993. S.Peyton Jones etal. The Haskell 98 language and libraries: the revised report. , 13(1):0--255, Jan 2003. . G. DosReisB.Stroustrup. Specifying C++ concepts. , 41(1):295--308, 2006. A.Sabi. Typing algorithm in type theory with inheritance. , POPL '97, 292--301. ACM, 1997. W.A.Stein etal. . The Sage Development Team, 2004. <\with|font-family|tt> http://www.sagemath.org . R. S.SutorR. D.Jenks. The type inference and coercion facilities in the scratchpad ii interpreter. , 22(7):56--63, 1987. B.Stroustrup. . Addison-Wesley, 2-nd, 1995. S.Watt, P.A.Broadbery, S.S.Dooley, P.Iglio, S.C.Morrison, J.M.SteinbachR.S.Sutor. A first report on the A# compiler. , ISSAC '94, 25--31. New York, NY, USA, 1994. ACM. S.Watt etal. Aldor programming language. <\verbatim> http://www.aldor.org/ , 1994. Wolfram Research. Mathematica. <\verbatim> http://www.wolfram.com/mathematica/ , 1988. <\initial> <\collection> <\references> <\collection> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > <\auxiliary> <\collection> <\associate|bib> Formac Maxima MF71 Mos12 Formac Jen74 Scratchpad JT81 SJ87 Jen79 Gir71 Mil78 ML79 Axiom JS92 Aldor Watt94 Mathematica Maple Sage GMP MPFR Flint FGb vdH:mmx vdH:issac11 Coq CH88 Isabelle Haskell HHPW07 OCaml Str95 BR97 DRS06 Sai97 SJ87 M2 EGSS02 SJ87 <\associate|toc> |math-font-series||Abstract> |.>>>>|> |Keywords |.>>>>|> > |math-font-series||1.Introduction> |.>>>>|> |Motivation for a new language |.>>>>|> > |Main philosophy behind the type system |.>>>>|> > |Overview of this paper |.>>>>|> > |math-font-series||2.Comparison on an example> |.>>>>|> |2.1Mathemagix |.>>>>|> > |2.2Aldor |.>>>>|> > |2.3C++ |.>>>>|> > |2.4Ocaml |.>>>>|> > |2.5Haskell |.>>>>|> > |2.6Coq |.>>>>|> > |2.7Discussion |.>>>>|> > |math-font-series||3.Overview of the language> |.>>>>|> |3.1Ordinary variables and functions |.>>>>|> > |3.2Classes |.>>>>|> > |3.3Categories |.>>>>|> > |3.4Discrete overloading |.>>>>|> > |3.5Parametric overloading |.>>>>|> > |3.6Implicit conversions |.>>>>|> > |3.7Syntactic sugar |.>>>>|> > |3.8Future extensions |.>>>>|> > |math-font-series||4.Semantics and compilation> |.>>>>|> |4.1Source language |.>>>>|> > |Typed lambda expressions. |.>>>>|> > |Declarations. |.>>>>|> > |Overloaded expressions. |.>>>>|> > |4.2Target language |.>>>>|> > |Disambiguation operators. |.>>>>|> > |Category matching. |.>>>>|> > |4.3Compilation |.>>>>|> > |4.3.1Schematic behaviour |.>>>>|> > |4.3.2Resolution of ambiguities |.>>>>|> > |4.3.3Implementation issues |.>>>>|> > |4.3.4Theoretical problems |.>>>>|> > |4.4Execution |.>>>>|> > |math-font-series||5.References> |.>>>>|>