diff --git a/doc/gf-modules.html b/doc/gf-modules.html new file mode 100644 index 000000000..2ca52d598 --- /dev/null +++ b/doc/gf-modules.html @@ -0,0 +1,553 @@ + + + + +
+ + + + + +

The Module System of GF

+ +

+ +8/4/2005 + +

+ +Aarne Ranta + +

+ +A GF grammar consists of a set of modules, which can be +combined in different ways to build different grammars. +There are several different types of modules: + +We will go through the module types in this order, which is also +their order of "importance" from the most frequently used to +the more esoteric/advanced ones. + +

+ +This document is meant as an appendix to the GF tutorial, and +presupposes knowledge of GF judgements and expressions. It aims +just to tell what module system adds to the old functionality; +some information is repeated to give understanding on how the +module system related to the already familiar uses of GF grammars. + + + +

Abstract syntax

+ +Any GF grammar that is used in an application +will probably contain at least one module +of the abstract module type. Here is an example of +such a module, defining a fragment of propositional logic. +
+  abstract Logic = {
+    cat Prop ;
+    fun Conj : Prop -> Prop -> Prop ;
+    fun Disj : Prop -> Prop -> Prop ;
+    fun Impl : Prop -> Prop -> Prop ;
+    fun Falsum : Prop ;
+    }
+
+The name of this module is Logic. + +

+ +An abstract module defines an abstract syntax, which +is a language-independent representation of a fragment of language. +It consists of two kinds of judgements: +

+There can also be def and data judgements in an +abstract syntax. + + +

Compilation of abstract syntax

+ +The GF grammar compiler expects to find the module Logic in a file named +Logic.gf. When the compiler is run, it produces +another file, named Logic.gfc. This file is in the +format called canonical GF, which is the "machine language" +of GF. Next time that the module Logic is needed in +compiling a grammar, it can be read from the compiled (gfc) +file instead of the source (gf) file, unless the source +has been changed after the compilation. + + +

Concrete syntax

+ +In order for a GF grammar to describe a concrete language, the abstract +syntax must be completed with a concrete syntax of it. +For this purpose, we use modules of type concrete: for instance, +
+  concrete LogicEng of Logic = {
+    lincat Prop = {s : Str} ;
+    lin Conj a b = {s = a.s ++ "and" ++ b.s} ;
+    lin Disj a b = {s = a.s ++ "or"  ++ b.s} ;
+    lin Impl a b = {s = "if" ++ a.s ++ "then"  ++ b.s} ;
+    lin Falsum = {s = ["we have a contradiction"]} ;
+    }
+
+The module LogicEng is a concrete syntax of the +abstract syntax Logic. The GF grammar compiler checks that +the concrete is valid with respect to the abstract syntax of +which it is claimed to be. The validity requires that there has to be + +Validity also requires that the linearization functions defined by +lin judgements are type-correct with respect to the +linearization types of the arguments and value of the function. + +

+ +There can also be lindef and printname judgements in a +concrete syntax. + + +

Top-level grammar

+ +When a concrete module is successfully compiled, a gfc +file is produced in the same way as for abstract modules. The +pair of an abstract and a corresponding concrete module +is a top-level grammar, which can be used in the GF system to +perform various tasks. The most fundamental tasks are + +In the current grammar, infinitely many trees and strings are recognized, although +no very interesting ones. For example, the tree +
+  Impl (Disj Falsum Falsum) Falsum
+
+has the linearization +
+  if we have a contradiction or we have a contradiction then we have a contradiction
+
+which in turn can be parsed uniquely as that tree. + + +

Compiling top-level grammars

+ +When GF compiles the module LogicEng it also has to compile +all modules that it depends on (in this case, just Logic). +The compilation process starts with dependency analysis to find +all these modules, recursively, starting from the explicitly imported one. +The compiler then reads either gf or gfc files, in +a dependency order. The decision on which files to read depends on +time stamps and dependencies in a natural way, so that all and only +those modules that have to be compiled are compiled. (This behaviour can +be changed with flags, see below.) + + +

Using top-level grammars

+ +To use a top-level grammar in the GF system, one uses the import +command (short name i). For instance, +
+  i LogicEng.gf
+
+It is also possible to specify the imported grammar(s) on the command +line when invoking GF: +
+  gf LogicEng.gf
+
+Various compilation flags can be added to both ways of compiling a module: + +Importing a grammar makes it visible in GF's internal state. To see +what modules are available, use the command print_options (po). +You can empty the state with the command empty (e); this is +needed if you want to read in grammars with a different abstract syntax +than the current one without exiting GF. + +

+ +Grammar modules can reside in different directories. They can then be found +by means of a search path, which is a flag such as +

+  -path=.:../prelude
+
+given to the import command or the shell command invoking GF. +(It can also be defined in the grammar file; see below.) The compiler +writes every gfc file in the same directory as the corresponding +gf file. + +

+ +Parsing and linearization can be performed with the parse +(p) and linearize (l) commands, respectively. +For instance, +

+  > l Impl (Disj Falsum Falsum) Falsum
+   if we have a contradiction or we have a contradiction then we have a contradiction
+
+  > p -cat=Prop "we have a contradiction"
+  Falsum
+
+Notice that the parse command needs the parsing category +as a flag. This necessary since a grammar can have several +possible parsing categories ("entry points"). + + + +

Multilingual grammar

+ +One abstract syntax can have several concrete syntaxes. +Here are two new ones for Logic: +
+  concrete LogicFre of Logic = {
+    lincat Prop = {s : Str} ;
+    lin Conj a b = {s = a.s ++ "et" ++ b.s} ;
+    lin Disj a b = {s = a.s ++ "ou"  ++ b.s} ;
+    lin Impl a b = {s = "si" ++ a.s ++ "alors"  ++ b.s} ;
+    lin Falsum = {s = ["nous avons une contradiction"]} ;
+    }
+
+  concrete LogicSymb of Logic = {
+    lincat Prop = {s : Str} ;
+    lin Conj a b = {s = "(" ++ a.s ++ "&" ++ b.s ++ ")"} ;
+    lin Disj a b = {s = "(" ++ a.s ++ "v" ++ b.s ++ ")"} ;
+    lin Impl a b = {s = "(" ++ a.s ++ "->" ++ b.s ++ ")"} ;
+    lin Falsum = {s = "_|_"} ;
+    }
+
+The four modules Logic, LogicEng, LogicFre, and +LogicSymb together form a multilingual grammar, in which +it is possible to perform parsing and linearization with respect to any +of the concrete syntaxes. As a combination of parsing and linearization, +one can also perform translation from one language to another. +(By language we mean the set of expressions generated by one +concrete syntax.) + + +

Using multilingual grammars

+ +Any combination of abstract syntax and corresponding concrete syntaxes +is thus a multilingual grammar. With many languages and other enrichments +(as described below), a multilingual grammar easily grows to the size of +tens of modules. The grammar developer, having finished her job, can +package the result in a multilingual canonical grammar, a file +with the suffix .gfcm. For instance, to compile the set of grammars +described by now, the following sequence of GF commands can be used: +
+  i LogicEng.gf
+  i LogicFre.gf
+  i LogicSymb.gf
+  pm | wf logic.gfcm
+
+The "end user" of the grammar only needs the file logic.gfcm to +access all the functionality of the multilingual grammar. It can be +imported in the GF system in the same way as .gf files. But +it can also be used in the Embedded Java Interpreter for GF to +build Java programs of which the multilingual grammar functionalities +(linearization, parsing, translation) form a part. + +

+ +In a multilingual grammar, the concrete syntax module names work as +names of languages that can be selected for linearization and parsing: +

+  > l -lang=LogicFre Impl Falsum Falsum
+  si nous avons une contradiction alors nous avons une contradiction
+
+  > l -lang=LogicSymb Impl Falsum Falsum
+  ( _|_ -> _|_ )
+
+  > p -cat=Prop -lang=LogicSymb "( _|_ & _|_ )"
+  Conj Falsum Falsum
+
+The option -multi gives linearization to all languages: +
+  > l -multi Impl Falsum Falsum
+  if we have a contradiction then we have a contradiction
+  si nous avons une contradiction alors nous avons une contradiction
+  ( _|_ -> _|_ )
+
+Translation can be obtained by using a pipe from a parser +to a linearizer: +
+  > p -cat=Prop -lang=LogicSymb "( _|_ & _|_ )" | l -lang=LogicEng
+  if we have a contradiction then we have a contradiction
+
+ + +

Exercise

+ +Write yet another concrete syntax of Logic, for +a language or symbolic notation of your choice. + + +

Resource modules

+ +The concrete modules shown above would look much nicer if +we used the main idea of functional programming: avoid repetitive +code by using functions that capture repeated patterns of +expressions. A collection of such functions can be a valuable +resource for a programmer, reusable in many different +top-level grammars. Thus we introduce the resource +module type, with the first example +
+  resource Util = {
+    oper SS : Type = {s : Str} ;
+    oper ss : Str -> SS = \s -> {s = s} ;
+    oper paren : Str -> Str = \s -> "(" ++ s ++ ")" ;
+    oper infix : Str -> SS -> SS -> SS = \h,x,y ->
+      ss (x.s ++ h ++ y.s) ;
+    oper infixp : Str -> SS -> SS -> SS = \h,x,y ->
+      ss (paren (infix h x y)) ;
+    }
+
+Modules of resource type have two forms of judgement: + +A resource can be used in a concrete (or another +resource) by opening it. This means that +all operations (and parameter types) defined in the resource +module become usable in module that opens it. For instance, +we can rewrite the module LogicSymb much more concisely: +
+  concrete LogicSymb of Logic = open Util in {
+    lincat Prop = SS ;
+    lin Conj = infixp "&" ;
+    lin Disj = infixp "v" ;
+    lin Impl = infixp "->" ;
+    lin Falsum = ss "_|_" ;
+    }
+
+What happens when this variant of LogicSymb is +compiled is that the oper-defined constants +of Util are inlined in the +right-hand-sides of the judgements of LogicSymb, +and these expressions are partially evaluated, i.e. +computed as far as possible. The generated gfc file +will look just like the file generated for the first version +of LogicSymb - at least, it will do the same job. + +

+ +Several resource modules can be opened +at the same time. If the modules contain same names, the +conflict can be resolved by qualified opening and +reference. For instance, +

+  concrete LogicSymb of Logic = open Util, Prelude in { ...
+    } ;
+
+(where Prelude is a standard library of GF) brings +into scope two definitions of the constant SS. +To specify which one is used, you can write +Util.SS or Prelude.SS instead of just SS. +You can also introduce abbreviations to avoid long qualifiers, e.g. +
+  concrete LogicSymb of Logic = open (U=Util), (P=Prelude) in { ...
+    } ;
+
+which means that you can write U.SS and P.SS. + + +

Compiling resource modules

+ +The compilation of a resource module differs +from the compilation of abstract and +concrete modules because oper operations +do not in general have values in gfc. A gfc +file is generated, but it contains only +param judgements (also recall that opers +are inlined in their top-level use sites, so it is not +necessary to save them in the compiled grammar). +However, since computing the operations over and over +again can be time comsuming, and since type checking +resource modules also takes time, a third kind +of file is generated for resource modules: a .gfr +file. This file is written in the GF source code notation, +but it is type checked and type annotated, and opers +are computed as far as possible. + +

+ +If you look at any gfc or gfr file generated +by the GF compiler, you see that all names have been replaced by +their qualified variants. This is an important first step (after parsing) +the compiler does. As for the commands in the GF shell, some output +qualified names and some not. The difference does not always result +from firm principles. + + +

Using resource modules

+ +The typical use is through open in a +concrete module, which means that +resource modules are not imported on their own. +However, when developing and testing phase of grammars, it +can be useful to evaluate opers with different +arguments. To prevent them from being inlined away, the +-retain option can be used: +
+  > i -retain Util.gf
+
+The command compute_concrete (cc) +can now be used for evaluating expressions that may concain +operations defined in Util: +
+  > cc ss (paren "foo")
+  {s = "(" ++ "foo" ++ ")"}
+
+To find out, what opers are available for a given type, +the command show_operations (so) can be used: +
+  > so SS
+  Util.ss : Str -> SS ;
+  Util.infix : Str -> SS -> SS -> SS ;
+  Util.infixp : Str -> SS -> SS -> SS ;
+
+ + +

Exercise

+ +Rewrite the modules LogicEng and LogicFre +by making use of the resource. + + +

Inheritance

+ +The most characteristic modularity of GF lies in the division of +grammars into abstract, concrete, and +resource modules. This permits writing multilingual +grammar and sharing the maximum of code between different +languages. + +

+ +In addition to its special modularity, GF provides inheritance, +which is familiar from other programming languages (in particular, +object-oriented ones). Inheritance means that a module inherits all +judgements from another module; we also say that it extends +the other module. Inheritance is useful to divide big grammars into +smaller units, and also to reuse the same units in different bigger +grammars. + +

+ +The first example of inheritance is for abstract syntax. Let us +extend the module Logic to Arithmetic: +

+  abstract Arithmetic = Logic ** {
+    cat Nat ;
+    fun Even : Nat -> Prop ;
+    fun Odd  : Nat -> Prop ;
+    fun Zero : Nat ;
+    fun Succ : Nat -> Nat ;
+    }
+
+In parallel with the extension of the abstract syntax +Logic to Arithmetic, we can extend +the concrete syntax LogicEng to ArithmeticEng: +
+  concrete ArithmeticEng of Arithmetic = LogicEng ** open Util in {
+    lincat Nat = SS ;
+    lin Even x = ss (x.s ++ "is" ++ "even") ;
+    lin Odd x = ss (x.s ++ "is" ++ "odd") ;
+    lin Zero = ss "zero" ;
+    lin Succ x = ss ("the" ++ "successor" ++ "of" ++ x.s) ;
+    }
+
+Another extension of Logic is Geometry, +
+  abstract Geometry = Logic ** {
+    cat Point ;
+    cat Line ;
+    fun Incident : Point -> Line -> Prop ;
+    }
+
+The corresponding concrete syntax is left as exercise. + +

+ +Inheritance can be multiple, which means that a module +may extend many modules at the same time. Suppose, for instance, +that we want to build a module for mathematics covering both +arithmetic and geometry, and the underlying logic. We then write +

+  abstract Mathematics = Arithmetic, Geometry ** {
+    } ;
+
+We could of course add some new judgements in this module, but +it is not necessary to do so. + +

+ +The module Mathematics also shows that it is possibe +to extend a module already built by extension. The correctness +criterion for extensions is that the same name +(cat, fun, oper, or param) +may not be defined twice in the resulting union of names. +That the names defined in Logic are "inherited twice" +by Mathematics (via both Arithmetic and +Geometry) is no violation of this rule; the usual +problems of multiple inheritance do not arise, since +the definitions of inherited constants cannot be changed. + + +

Compiling inheritance

+ +Inherited judgements are not copied into the inheriting modules. +Instead, an indirection is created for each inherited name, +as can be seen by looking into the generated gfc (and +gfr) files. Thus for instance the names +
+  Mathematics.Prop  Arithmetic.Prop  Logic.Prop
+                    Geometry.Prop 
+
+all refer to the same category, declared in the module +Logic. + + + +

Inspecting grammar hierarchies

+ +The command visualize_graph (vg) shows the +dependency graph in the current GF shell state. The graph can +also be saved in a file and used e.g. in documentation. + + +

Reuse of top-level grammars as resources

+ + +

Interfaces, instances, and incomplete grammars

+ + +

Transfer modules

+ + + + + + +