diff --git a/doc/gf-refman.html b/doc/gf-refman.html index 325a2ad0f..1db2b0a87 100644 --- a/doc/gf-refman.html +++ b/doc/gf-refman.html @@ -100,10 +100,10 @@

This document is a reference manual to the GF programming language. GF, Grammatical Framework, is a special-purpose programming language, -designed to support definitions of grammars. +designed to support definitions of grammars.

-This document is not an introduction to GF; such introduction can be +This document is not an introduction to GF; such introduction can be found in the GF tutorial available on line on the GF web page,

@@ -118,7 +118,7 @@ do with the language specification.

This manual is meant to be fully compatible with GF version 3.0. Main discrepancies with version 2.8 are indicated, -as well as with the reference article on GF, +as well as with the reference article on GF,

A. Ranta, "Grammatical Framework. A Type Theoretical Grammar Formalism", @@ -139,17 +139,17 @@ As metalinguistic notation, we will use the symbols

Overview of GF

GF is a typed functional language, -borrowing many of its constructs from ML and Haskell: algebraic datatypes, +borrowing many of its constructs from ML and Haskell: algebraic datatypes, higher-order functions, pattern matching. The module system bears resemblance to ML (functors) but also to object-oriented languages (inheritance). The type theory used in the abstract syntax part of GF is inherited from logical frameworks, in particular ALF ("Another Logical Framework"; in a sense, GF is Yet Another ALF). From ALF comes also the use of dependent -types, including the use of explicit type variables instead of +types, including the use of explicit type variables instead of Hindley-Milner polymorphism.

-The look and feel of GF is close to Java and +The look and feel of GF is close to Java and C, due to the use of curly brackets and semicolons in structuring the code; the expression syntax, however, follows Haskell in using juxtaposition for function application and parentheses only for grouping. @@ -173,7 +173,7 @@ abstract syntax, however, is fully recursive.

Even though run-time GF grammars manipulate just nested tuples, at compile -time these are represented by by the more fine-grained labelled records +time these are represented by by the more fine-grained labelled records and finite functions over algebraic datatypes. This enables the programmer to write on a higher abstraction level, and also adds type distinctions and hence raises the level of checking of programs. @@ -187,7 +187,7 @@ The big picture of GF as a programming language for multilingual grammars explains its principal module structure. Any GF grammar must have an abstract syntax module; it can in addition have any number of concrete syntax modules matching that abstract syntax. Before going to details, -we give a simple example: a module defining the category A +we give a simple example: a module defining the category A of adjectives and one adjective-forming function, the zero-place function Even. We give the module the name Adj. The GF code for the module looks as follows: @@ -202,20 +202,20 @@ module looks as follows: Here are two concrete syntax modules, one intended for mapping the trees to English, the other to Swedish. The mappling is defined by lincat definitions assigning a linearization type to each category, -and lin definitions assigning a linearization to each function. +and lin definitions assigning a linearization to each function.

     concrete AdjEng of Adj = {
       lincat A = {s : Str} ;
       lin Even = {s = "even"} ;
     }
-  
+
     concrete AdjSwe of Adj = {
       lincat A = {s : AForm => Str} ;
       lin Even = {s = table {
-        ASg Utr   => "jämn" ;
-        ASg Neutr => "jämnt" ;
-        APl       => "jämna"
+        ASg Utr   => "jƤmn" ;
+        ASg Neutr => "jƤmnt" ;
+        APl       => "jƤmna"
         }
       } ;
       param AForm = ASg Gender | APl ;
@@ -262,7 +262,7 @@ much more module structure than strictly required in top-level grammars.
 

Inheritance, also known as extension, means that a module can inherit the -contents of one or more other modules to which new judgements are added, +contents of one or more other modules to which new judgements are added, e.g.

@@ -278,7 +278,7 @@ in several concrete syntaxes,
     resource MorphoFre = {
       param Number = Sg | Pl ;
       param Gender = Masc | Fem ;
-      oper regA : Str -> {s : Gender => Number => Str} = 
+      oper regA : Str -> {s : Gender => Number => Str} =
         \fin -> {
           s = table {
             Masc => table {Sg => fin ; Pl => fin + "s"} ;
@@ -288,7 +288,7 @@ in several concrete syntaxes,
     }
 

-By opening, a module can use the contents of a resource module +By opening, a module can use the contents of a resource module without inheriting them, e.g.

@@ -307,7 +307,7 @@ modules, e.g.
       oper Adjective : Type ;
       oper even_A : Adjective ;
     }
-  
+
     instance LexiconEng of Lexicon = {
       oper Adjective = {s : Str} ;
       oper even_A = {s = "even"} ;
@@ -315,7 +315,7 @@ modules, e.g.
 

Functors i.e. parametrized modules i.e. incomplete modules, defining -a concrete syntax in terms of an interface. +a concrete syntax in terms of an interface.

     incomplete concrete AdjI of Adj = open Lexicon in {
@@ -335,7 +335,7 @@ A functor can be instantiated by providing instances of its open interfac
 

The compilation unit of GF source code is a file that contains a module. Judgements outside modules are supported only for backward compatibility, -as explained here. +as explained here. Every source file, suffixed .gf, is compiled to a "GF object file", suffixed .gfo (as of GF Version 3.0 and later). For runtime grammar objects used for parsing and linearization, a set of .gfo files is linked to @@ -363,42 +363,42 @@ grammar.pgf Both .gf and .gfo files are written in the GF source language; .pgf files are written in a lower-level format. The process of translating .gf to .gfo consists of name resolution, type annotation, -partial evaluation, and optimization. +partial evaluation, and optimization. There is a great advantage in the possibility to do this -separately for GF modules and saving the result in .gfo files. The partial +separately for GF modules and saving the result in .gfo files. The partial evaluation phase, in particular, is time and memory consuming, and GF libraries are therefore distributed in .gfo to make their use less arduous.

-In GF before version 3.0, the object files are in a format called .gfc, +In GF before version 3.0, the object files are in a format called .gfc, and the multilingual runtime grammar is in a format called .gfcm.

The standard compiler has a built-in make facility, which finds out what -other modules are needed when compiling an explicitly given module. -This facility builds a dependency graph and decides which of the involved +other modules are needed when compiling an explicitly given module. +This facility builds a dependency graph and decides which of the involved modules need recompilation (from .gf to .gfo), and for which the -GF object can be used directly. +GF object can be used directly.

Names

Each module M defines a set of names, which are visible in M -itself, in all modules extending M (unless excluded, as explained +itself, in all modules extending M (unless excluded, as explained here), and all modules opening M. These names can stand for abstract syntax categories and functions, parameter types and parameter constructors, and operations. All these names live in the same name space, which means that a name entering a module more than once due to inheritance or -opening can lead to a conflict. It is specified +opening can lead to a conflict. It is specified here how these conflicts are resolved.

The names of modules live in a name space separate from the other names. -Even here, all names must be distinct in a set of files compiled to a +Even here, all names must be distinct in a set of files compiled to a multilingual grammar. In particular, even files residing in different directories -must have different names, since GF has no notion of hierarchic +must have different names, since GF has no notion of hierarchic module names.

@@ -439,7 +439,7 @@ Any of the parts extends, opens, and body may be empty. If they are all filled, delimiters and keywords separate the parts in the following way:

-moduletype name = +moduletype name = extends ** open opens in { body }
The part moduletype name looks slightly different if the @@ -447,18 +447,18 @@ type is concrete or instance: the name intrudes the type keyword and the name of the module being implemented and which really belongs to the type of the module:
- concrete name of abstractname + concrete name of abstractname
The only exception to the schema of functor syntax is functor instantiations: the instantiation list is given in a special way between extends and opens:
-incomplete concrete name of abstractname = - extends ** functorname with instantiations ** +incomplete concrete name of abstractname = + extends ** functorname with instantiations ** open opens in { body }
Logically, the part "functorname with instantiations" should -really be one of the extends. This is also shown by the fact that +really be one of the extends. This is also shown by the fact that it can have restricted inheritance (concept defined here).

@@ -538,7 +538,7 @@ The table uses the following shorthands for lists of module types:

-The legality of judgements in the body is checked before the judgements +The legality of judgements in the body is checked before the judgements themselves are checked.

@@ -550,7 +550,7 @@ The forms of judgement are explained here. Why are the legality conditions of opens and extends so complicated? The best way to grasp them is probably to consider a simplified logical model of the module system, replacing modules by types and functions. This model could actually -be developed towards treating modules in GF as first-class objects; so far, +be developed towards treating modules in GF as first-class objects; so far, however, this step has not been motivated by any practical needs.

@@ -603,7 +603,7 @@ GADTs and record types: tuples over strings and integers
  • an interface is a labelled record type
  • an instance is a record of the type defined by the interface -
  • a functor, with a module body opening an interface, is a function +
  • a functor, with a module body opening an interface, is a function on its instances
  • the instantiation of a functor is an application of the function to some instance @@ -627,7 +627,7 @@ When an abstract is used as an interface and a concrete as its instance, they are actually reinterpreted so that they match the model. Then the abstract is no longer a GADT, but a system of abstract datatypes, with a record field of type Type for each category, and a function among these types for each -abstract syntax function. A concrete syntax instantiates this record with +abstract syntax function. A concrete syntax instantiates this record with linearization types and linearizations.

    @@ -643,7 +643,7 @@ error: GF provides no way to redefine an inherited constant.

    Simple as the definition of a conflict may sound, it has to take care of the -inheritance hierarchy. A very common pattern of inheritance is the +inheritance hierarchy. A very common pattern of inheritance is the diamond: inheritance from two modules which themselves inherit a common base module. Assume that the base module defines a name f:

    @@ -681,16 +681,16 @@ Inheritance can be restricted. This means that a module can be specified as inheriting only explicitly listed constants, or all constants except ones explicitly listed. The syntax uses constant names in brackets, prefixed by a minus sign in the case of an exclusion list. In the following -configuration, N inherits a,b,c from M1, and all names but d +configuration, N inherits a,b,c from M1, and all names but d from M2

         N = M1 {a,b,c}, M2-{d}
     

    -Restrictions are performed as a part of inheritance linking, module by module: +Restrictions are performed as a part of inheritance linking, module by module: the link is created for a constant if and only if it is both -included in the module and compatible with the restriction. Thus, +included in the module and compatible with the restriction. Thus, for instance, an inadvertent usage can exclude a constant from one module but inherit it from another one. In the following configuration, f is inherited via M1, if M1 inherits it. @@ -709,11 +709,11 @@ exclusion has the effect of creating an ill-formed module: M [f] ===> {fun f : C ;}

    -One might expect inheritance restriction to be transitive: if an included +One might expect inheritance restriction to be transitive: if an included constant b depends on some other constant a, then a should be included automatically. However, this rule would leave to hard-to-detect inheritances. And it could only be applied later in the compilation phase, -when the compiler has not only collected the names defined, but also +when the compiler has not only collected the names defined, but also resolved the names used in definitions.

    @@ -725,9 +725,9 @@ must replicate all restrictions of the functor.

    Opening

    -Opening makes constants from other modules usable in judgements, without +Opening makes constants from other modules usable in judgements, without inheriting them. This means that, unlike inheritance, opening is not -transitive. +transitive.

    @@ -736,7 +736,7 @@ transitive. Opening cannot be restricted as inheritance can, but it can be qualified. This means that the names from the opened modules cannot be used as such, but only as prefixed by a qualifier and a dot (.). The qualifier can be any -identifier, including the name of the module. Here is an example of +identifier, including the name of the module. Here is an example of an opens list:

    @@ -759,7 +759,7 @@ Thus qualification by real module name is always possible, and one and the same
     module can be qualified in different ways at the same time (the latter can
     be useful if you want to be able to change the implementations of some
     constants to a different resource later). Since the qualification with real
    -module name is always possible, it is not possible to "swap" the names of 
    +module name is always possible, it is not possible to "swap" the names of
     modules locally:
     

    @@ -785,8 +785,8 @@ hence not dependent on e.g. types, which are known only at a later phase.
     

    Qualification of names is the main device for avoiding conflicts in name resolution. No other information is used, such as priorities between -modules. However, if a name is defined in different opened modules -but never used in the module body, +modules. However, if a name is defined in different opened modules +but never used in the module body, a conflict does not arise: conflicts arise only when names are used. Also in this respect, opening is thus different from inheritance, where conflicts are checked independently of use. @@ -807,10 +807,10 @@ equations, assigning an instance to every interface. Here is a typical example, displaying the full generality:

    -    concrete FoodsEng of Foods = PhrasesEng ** 
    -      FoodsI-[Pizza] with 
    +    concrete FoodsEng of Foods = PhrasesEng **
    +      FoodsI-[Pizza] with
             (Syntax = SyntaxEng),
    -        (LexFoods = LexFoodsEng) ** 
    +        (LexFoods = LexFoodsEng) **
           open SyntaxEng, ParadigmsEng in {
             lin Pizza = mkCN (mkA "Italian") (mkN "pie") ;
         }
    @@ -856,12 +856,12 @@ have a definition part. While a resource must be complete, an
     parts of judgements are optional.
     

    -An instance is complete with respect to an interface, if it +An instance is complete with respect to an interface, if it gives the definition parts of all oper and param judgements that are omitted in the interface. Giving definitions to judgements that have already been defined in the interface is illegal. Type signatures, on the other hand, can be repeated if the same types -are used. +are used.

    In addition to completing the definitions in an interface, @@ -881,7 +881,7 @@ above variations: } ; oper regNoun : Str -> Noun ; -- no definition } - + instance PosEng of Pos = { param Case = Nom | Gen ; -- definition of Case -- Number and Noun inherited @@ -915,7 +915,7 @@ There are several different forms of judgement, identified by different judgement keywords. Here is a list of all these forms, together with syntax descriptions and the types of modules in which each form can occur. The table moreover indicates whether the judgement has a default value, and -whether it contributes to the name base, i.e. introduces a new +whether it contributes to the name base, i.e. introduces a new name to the scope.

  • @@ -1026,7 +1026,7 @@ Judgements that have default values are rarely used, except lincat

    Introducing a name twice in the same module is an error. In other words, -all judgements that have a "yes" in the name base column, must +all judgements that have a "yes" in the name base column, must have distinct identifiers on their left-hand sides.

    @@ -1043,7 +1043,7 @@ each form. There are moreover two kinds of syntactic sugar common to all forms:

    keyw J ; K ; === keyw J ; keyw K ;
    -
  • the right-hand sides of colon (:) and equality (=) +
  • the right-hand sides of colon (:) and equality (=) can be shared, by using comma (,) as separator of left-hand sides, which must consist of identifiers
    @@ -1065,13 +1065,13 @@ can be correct even though f and g required different function types.

    -Within a module, judgements can occur in any order. In particular, -a name can be used before it is introduced. +Within a module, judgements can occur in any order. In particular, +a name can be used before it is introduced.

    -The explanations of judgement forms refer to the notions +The explanations of judgement forms refer to the notions of type and term (the latter also called expression). -These notions will be explained in detail here. +These notions will be explained in detail here.

    Category declarations, cat

    @@ -1079,7 +1079,7 @@ These notions will be explained in detail here.

    -Category declarations +Category declarations

    cat C G
    @@ -1110,7 +1110,7 @@ and a sequence does not have any separator symbols. As syntactic sugar,
    ( x,y : T ) === ( x : T ) ( y : T )
    -
  • a wildcard can be used for a variable not occurring in types +
  • a wildcard can be used for a variable not occurring in types later in the context,
    ( _ : T ) === ( x : T ) @@ -1142,16 +1142,16 @@ the function type constructor ->. Thus its form is
    (x1 : A1) -> ... -> (xn : An) -> B
    -where Ai are types, called the argument types, and B is a +where Ai are types, called the argument types, and B is a basic type, called the value type of f. The value category of f is the category that forms the type B.

    A syntax tree is formed from f by applying it to a full list of -arguments, so that the result is of a basic type. +arguments, so that the result is of a basic type.

    -A higher-order function is one that has a function type as an +A higher-order function is one that has a function type as an argument. The concrete syntax of GF does not support displaying the bound variables of functions of higher than second order, but they are legal in abstract syntax. @@ -1184,7 +1184,7 @@ The set of def definitions for f can be scattered around the module in which f is introduced as a function. The compiler builds the set of pattern equations in the order in which the equations appear; this order is significant in the case of -overlapping patterns. All equations must appear in the same module in +overlapping patterns. All equations must appear in the same module in which f itself declared.

    @@ -1195,8 +1195,8 @@ syntax, constructor patterns are those of the form C p1 ... pn

    where C is declared as data for some abstract syntax category -(see next section). A variable pattern is either an identifier or -a wildcard. +(see next section). A variable pattern is either an identifier or +a wildcard.

    A common pitfall is to forget to declare a constructor as data, which @@ -1208,7 +1208,7 @@ and in general by using pattern matching. Computation and pattern matchin are explained commonly for abstract and concrete syntax here.

    -In contrast to concrete syntax, abstract syntax computation is +In contrast to concrete syntax, abstract syntax computation is completely symbolic: it does not produce a value, but just another term. Hence it is not an error to have incomplete systems of pattern equations for a function. In addition, the definitions @@ -1224,7 +1224,7 @@ A data constructor definition,

  • defines the functions f1...fn to be constructors of the category C. This means that they are recognized as constructor -patterns when used in function definitions. +patterns when used in function definitions.

    In order for the data constructor definition to be correct, @@ -1240,7 +1240,7 @@ must appear in the same module in which the category is itself defined. There is syntactic sugar for declaring a function as a constructor at the same time as introducing it:

    -data f : A1 -> ... -> An -> C t1 ... tm +data f : A1 -> ... -> An -> C t1 ... tm

    === @@ -1267,8 +1267,8 @@ whereas the primitive notion status is overridden by any of the two others.

    This distinction is relevant for the semantics of abstract syntax, not -for concrete syntax. It shows in the way patterns are treated in -equations in def definitions: a constructor +for concrete syntax. It shows in the way patterns are treated in +equations in def definitions: a constructor in a pattern matches only itself, whereas any other name is treated as a variable pattern, which matches anything. @@ -1324,7 +1324,7 @@ where the type T* is defined as follows depending on T:

    -The second case is relevant for higher-order functions only. It says that +The second case is relevant for higher-order functions only. It says that the linearization type of the value type is extended by adding a string field for each argument types; these fields store the variable symbol used for the binding of each variable. @@ -1356,22 +1356,22 @@ A linearization default definition, lindef C = t

    defines the default linearization of category C, i.e. the function -applicable to a string to make it into an object of the linearization +applicable to a string to make it into an object of the linearization type of C.

    Linearization defaults are invoked when linearizing variable bindings in higher-order abstract syntax. A variable symbol is then presented as a string, which must be converted to correct type in order for -the linearization not to fail with an error. +the linearization not to fail with an error.

    The other use of the defaults is for linearizing metavariables and abstract functions without linearization in the concrete syntax. -In the first case the default linearization is applied to -the string "?X" where X is the unique index -of the metavariable, and in the second case the string is -"[f]" where f is the name of the abstract +In the first case the default linearization is applied to +the string "?X" where X is the unique index +of the metavariable, and in the second case the string is +"[f]" where f is the name of the abstract function with missing linearization.

    @@ -1422,10 +1422,10 @@ The reference linearization is also used for linearizing metavariables which stand in function position. For example the tree f (? x1 x2 .. xn) is linearized as follows. Each of the arguments x1 x2 .. xn is linearized, and after that -the reference linearization of the its category is applied +the reference linearization of the its category is applied to the output of the linearization. The result is a sequence of n strings which are concatenated into a single string. The final string -is the input to the default linearization of the category +is the input to the default linearization of the category for the argument of f. After applying the default linearization we get an object that we could safely pass to f.

    @@ -1441,7 +1441,7 @@ definition is by structural recursion on the type:
  • reference({r1 : R1; ... rn : Rn},o) = reference(R1, o.r1) || reference(R2, o.r2) || ... || reference(Rn, o.rn) Here each call to reference returns either (Just o) or Nothing. -When we compute the reference for a table or a record then we pick +When we compute the reference for a table or a record then we pick the reference for the first expression for which the recursive call gives us Just. If we get Nothing for all of them then the final result is Nothing too. @@ -1510,7 +1510,7 @@ names must be distinct in a module.

    A parameter type may not be recursive, i.e. P itself may not occur in -the contexts of its constructors. This restriction extends to mutual +the contexts of its constructors. This restriction extends to mutual recursion: we say that P depends on the types that occur in the contexts of its constructors and on all types that those types depend on, and state that P may not depend on itself. @@ -1531,7 +1531,7 @@ without defining it, All parameter types are finite, and the GF compiler will internally compute them to lists of parameter values. These lists are formed by traversing the param definitions, usually respecting the -order of constructors in the source code. For records, bibliographical +order of constructors in the source code. For records, bibliographical sorting is applied. However, both the order of traversal of param definitions and the order of fields in a record are specified in a compiler-internal way, which means that the programmer should not @@ -1542,7 +1542,7 @@ The order of the list of parameter values can affect the program in two cases:

      -
    • in the default lindef definition (here), +
    • in the default lindef definition (here), the first value is chosen
    • in course-of-value tables (here), the compiler-internal order is followed @@ -1581,7 +1581,7 @@ which works in two cases

      • the type can be inferred from t (compiler-dependent) -
      • the definition occurs in an instance and the type is given in +
      • the definition occurs in an instance and the type is given in the interface
      @@ -1616,18 +1616,18 @@ concrete syntax code (as explained here).

      One and the same operation name h can be used for different operations, -which have to have different types. For each call of h, the type checker +which have to have different types. For each call of h, the type checker selects one of these operations depending on what type is expected in the context of the call. The syntax of overloaded operation definitions is

      -oper h +oper h = overload {h : T1 = t1 ; ... ; h : Tn = tn}
      Notice that h must be the same in all cases. This format can be used to give the complete implementation; to give just the types, e.g. in an interface, one can use the form
      -oper h +oper h : overload {h : T1 ; ... ; h : Tn}
      The implementation of this operation typing is given by a judgement of @@ -1641,7 +1641,7 @@ A flag definition, flags o = v sets the value of the flag o, to be used when compiling or using -the module. +the module.

      The flag o is an identifier, and the value v is either an identifier @@ -1649,11 +1649,11 @@ or a quoted string.

      Flags are a kind of metadata, which do not strictly belong to the GF -language. For instance, compilers do not necessarily check the +language. For instance, compilers do not necessarily check the consistency of flags, or the meaningfulness of their values. The inheritance of flags is not well-defined; the only certain rule is that flags set in the module body override the settings from -inherited modules. +inherited modules.

      Here are some flags commonly included in grammars. @@ -1672,28 +1672,26 @@ Here are some flags commonly included in grammars.

  • - - - - - - - - - - - -
    concrete
    lexerpredefined lexerlexer before parsingconcrete
    startcat category default target of parsing abstract
    unlexerpredefined unlexerunlexer after linearizationconcrete

    -The possible values of these flags are specified here. +The possible values of these flags are +specified here. Note that +the lexer and unlexer flags are +deprecated. If you need their functionality, you should use supply +them to GF shell commands like so: + +

    put_string -lextext "страви, напої" | parse
    + +A summary of their possible values can be found at the GF shell + reference. +

    Types and expressions

    @@ -1703,14 +1701,14 @@ The possible values of these flags are specified here.

    -Like many dependently typed languages, GF makes no syntactic distinction +Like many dependently typed languages, GF makes no syntactic distinction between expressions and types. An illegal use of a type as an expression or vice versa comes out as a type error. Whether a variable, for instance, stands for a type or an expression value, can only be resolved from its -context of use. +context of use.

    -One practical consequence of the common syntax is that global and local definitions +One practical consequence of the common syntax is that global and local definitions (oper judgements and let expressions, respectively) work in the same way for types and expressions. Thus it is possible to abbreviate a type occurring in a type expression: @@ -1925,7 +1923,7 @@ essentially the functional fragment of the syntax. This fragment comprises two kinds of types: