From 9064c3d7cdf3a6415f495c9d74b08f58dc2c236c Mon Sep 17 00:00:00 2001
From: odanoburu
This document is a reference manual to the GF programming language.
GF, Grammatical Framework, is a special-purpose programming language,
-designed to support definitions of grammars.
+designed to support definitions of grammars.
-This document is not an introduction to GF; such introduction can be
+This document is not an introduction to GF; such introduction can be
found in the GF tutorial available on line on the GF web page,
@@ -118,7 +118,7 @@ do with the language specification.
This manual is meant to be fully compatible with GF version 3.0.
Main discrepancies with version 2.8 are indicated,
-as well as with the reference article on GF,
+as well as with the reference article on GF,
A. Ranta, "Grammatical Framework. A Type Theoretical Grammar Formalism",
@@ -139,17 +139,17 @@ As metalinguistic notation, we will use the symbols
GF is a typed functional language,
-borrowing many of its constructs from ML and Haskell: algebraic datatypes,
+borrowing many of its constructs from ML and Haskell: algebraic datatypes,
higher-order functions, pattern matching. The module system bears resemblance
to ML (functors) but also to object-oriented languages (inheritance).
The type theory used in the abstract syntax part of GF is inherited from
logical frameworks, in particular ALF ("Another Logical Framework"; in a
sense, GF is Yet Another ALF). From ALF comes also the use of dependent
-types, including the use of explicit type variables instead of
+types, including the use of explicit type variables instead of
Hindley-Milner polymorphism.
-The look and feel of GF is close to Java and
+The look and feel of GF is close to Java and
C, due to the use of curly brackets and semicolons in structuring the code;
the expression syntax, however, follows Haskell in using juxtaposition for
function application and parentheses only for grouping.
@@ -173,7 +173,7 @@ abstract syntax, however, is fully recursive.
Even though run-time GF grammars manipulate just nested tuples, at compile
-time these are represented by by the more fine-grained labelled records
+time these are represented by by the more fine-grained labelled records
and finite functions over algebraic datatypes. This enables the programmer
to write on a higher abstraction level, and also adds type distinctions
and hence raises the level of checking of programs.
@@ -187,7 +187,7 @@ The big picture of GF as a programming language for multilingual grammars
explains its principal module structure. Any GF grammar must have an
abstract syntax module; it can in addition have any number of concrete
syntax modules matching that abstract syntax. Before going to details,
-we give a simple example: a module defining the category Overview of GF
A
+we give a simple example: a module defining the category A
of adjectives and one adjective-forming function, the zero-place function
Even. We give the module the name Adj. The GF code for the
module looks as follows:
@@ -202,20 +202,20 @@ module looks as follows:
Here are two concrete syntax modules, one intended for mapping the trees
to English, the other to Swedish. The mappling is defined by
lincat definitions assigning a linearization type to each category,
-and lin definitions assigning a linearization to each function.
+and lin definitions assigning a linearization to each function.
concrete AdjEng of Adj = {
lincat A = {s : Str} ;
lin Even = {s = "even"} ;
}
-
+
concrete AdjSwe of Adj = {
lincat A = {s : AForm => Str} ;
lin Even = {s = table {
- ASg Utr => "jämn" ;
- ASg Neutr => "jämnt" ;
- APl => "jämna"
+ ASg Utr => "jƤmn" ;
+ ASg Neutr => "jƤmnt" ;
+ APl => "jƤmna"
}
} ;
param AForm = ASg Gender | APl ;
@@ -262,7 +262,7 @@ much more module structure than strictly required in top-level grammars.
Inheritance, also known as extension, means that a module can inherit the -contents of one or more other modules to which new judgements are added, +contents of one or more other modules to which new judgements are added, e.g.
@@ -278,7 +278,7 @@ in several concrete syntaxes,
resource MorphoFre = {
param Number = Sg | Pl ;
param Gender = Masc | Fem ;
- oper regA : Str -> {s : Gender => Number => Str} =
+ oper regA : Str -> {s : Gender => Number => Str} =
\fin -> {
s = table {
Masc => table {Sg => fin ; Pl => fin + "s"} ;
@@ -288,7 +288,7 @@ in several concrete syntaxes,
}
-By opening, a module can use the contents of a resource module +By opening, a module can use the contents of a resource module without inheriting them, e.g.
@@ -307,7 +307,7 @@ modules, e.g.
oper Adjective : Type ;
oper even_A : Adjective ;
}
-
+
instance LexiconEng of Lexicon = {
oper Adjective = {s : Str} ;
oper even_A = {s = "even"} ;
@@ -315,7 +315,7 @@ modules, e.g.
Functors i.e. parametrized modules i.e. incomplete modules, defining -a concrete syntax in terms of an interface. +a concrete syntax in terms of an interface.
incomplete concrete AdjI of Adj = open Lexicon in {
@@ -335,7 +335,7 @@ A functor can be instantiated by providing instances of its open interfac
The compilation unit of GF source code is a file that contains a module.
Judgements outside modules are supported only for backward compatibility,
-as explained here.
+as explained here.
Every source file, suffixed .gf, is compiled to a "GF object file",
suffixed .gfo (as of GF Version 3.0 and later). For runtime grammar objects
used for parsing and linearization, a set of .gfo files is linked to
@@ -363,42 +363,42 @@ grammar.pgf
Both .gf and .gfo files are written in the GF source language;
.pgf files are written in a lower-level format. The process of translating
.gf to .gfo consists of name resolution, type annotation,
-partial evaluation, and optimization.
+partial evaluation, and optimization.
There is a great advantage in the possibility to do this
-separately for GF modules and saving the result in .gfo files. The partial
+separately for GF modules and saving the result in .gfo files. The partial
evaluation phase, in particular, is time and memory consuming, and GF libraries
are therefore distributed in .gfo to make their use less arduous.
-In GF before version 3.0, the object files are in a format called .gfc,
+In GF before version 3.0, the object files are in a format called .gfc,
and the multilingual runtime grammar is in a format called .gfcm.
The standard compiler has a built-in make facility, which finds out what
-other modules are needed when compiling an explicitly given module.
-This facility builds a dependency graph and decides which of the involved
+other modules are needed when compiling an explicitly given module.
+This facility builds a dependency graph and decides which of the involved
modules need recompilation (from .gf to .gfo), and for which the
-GF object can be used directly.
+GF object can be used directly.
Names
Each module M defines a set of names, which are visible in M
-itself, in all modules extending M (unless excluded, as explained
+itself, in all modules extending M (unless excluded, as explained
here), and
all modules opening M. These names can stand for abstract syntax
categories and functions, parameter types and parameter constructors,
and operations. All these names live in the same name space, which
means that a name entering a module more than once due to inheritance or
-opening can lead to a conflict. It is specified
+opening can lead to a conflict. It is specified
here how these
conflicts are resolved.
The names of modules live in a name space separate from the other names.
-Even here, all names must be distinct in a set of files compiled to a
+Even here, all names must be distinct in a set of files compiled to a
multilingual grammar. In particular, even files residing in different directories
-must have different names, since GF has no notion of hierarchic
+must have different names, since GF has no notion of hierarchic
module names.
@@ -439,7 +439,7 @@ Any of the parts extends, opens, and body may be empty.
If they are all filled, delimiters and keywords separate the parts in the
following way:
-moduletype name =
+moduletype name =
extends ** open opens in { body }
The part moduletype name looks slightly different if the
@@ -447,18 +447,18 @@ type is concrete or instance: the name intrudes
the type keyword and the name of the module being implemented and which
really belongs to the type of the module:
- concrete name of abstractname
+ concrete name of abstractname
The only exception to the schema of functor syntax
is functor instantiations: the instantiation
list is given in a special way between extends and opens:
-incomplete concrete name of abstractname =
- extends ** functorname with instantiations **
+incomplete concrete name of abstractname =
+ extends ** functorname with instantiations **
open opens in { body }
Logically, the part "functorname with instantiations" should
-really be one of the extends. This is also shown by the fact that
+really be one of the extends. This is also shown by the fact that
it can have restricted inheritance (concept defined here).
@@ -538,7 +538,7 @@ The table uses the following shorthands for lists of module types:
-The legality of judgements in the body is checked before the judgements
+The legality of judgements in the body is checked before the judgements
themselves are checked.
@@ -550,7 +550,7 @@ The forms of judgement are explained here.
Why are the legality conditions of opens and extends so complicated? The best way
to grasp them is probably to consider a simplified logical model of the module
system, replacing modules by types and functions. This model could actually
-be developed towards treating modules in GF as first-class objects; so far,
+be developed towards treating modules in GF as first-class objects; so far,
however, this step has not been motivated by any practical needs.
| concrete | |||
lexer |
-predefined lexer | -lexer before parsing | -concrete | -
startcat |
category | default target of parsing | abstract |
unlexer |
-predefined unlexer | -unlexer after linearization | -concrete | -
-The possible values of these flags are specified here.
+The possible values of these flags are
+specified here. Note that
+the lexer and unlexer flags are
+deprecated. If you need their functionality, you should use supply
+them to GF shell commands like so:
+
+
put_string -lextext "ŃŃŃŠ°Š²Šø, напоŃ" | parse-Like many dependently typed languages, GF makes no syntactic distinction +Like many dependently typed languages, GF makes no syntactic distinction between expressions and types. An illegal use of a type as an expression or vice versa comes out as a type error. Whether a variable, for instance, stands for a type or an expression value, can only be resolved from its -context of use. +context of use.
-One practical consequence of the common syntax is that global and local definitions
+One practical consequence of the common syntax is that global and local definitions
(oper judgements and let expressions, respectively) work in the same way
for types and expressions. Thus it is possible to abbreviate a type
occurring in a type expression:
@@ -1925,7 +1923,7 @@ essentially the functional fragment
of the syntax. This fragment comprises two kinds of types:
cat C (x1 : A1)...(xn : An), including the predefined
categories Int, Float, and String explained here
@@ -1942,17 +1940,17 @@ of the syntax. This fragment comprises two kinds of types:
-When defining basic types, we used the notation +When defining basic types, we used the notation t{x1 = t1,...,xn=tn} for the substitution of values to variables. This is a metalevel notation, which denotes a term that is formed by replacing the free occurrences of -each variable xi by ti. +each variable xi by ti.
These types have six kinds of expressions:
fun f : A
As syntactic sugar, function types have sharing of types and -suppression of variables, in the same way as contexts +suppression of variables, in the same way as contexts (defined here):
( x,y : A ) -> B ===
+( x,y : A ) -> B ===
( x : A ) -> ( y : A ) -> B
\x -> b = \y -> b{x=y}
def f p1 ... pn = t
\x -> c x,
+\x -> c x,
if c : (x : A) -> B
-An expression is in beta-eta-normal form if +An expression is in beta-eta-normal form if
The syntax trees defined by an abstract syntax are well-typed -expressions of basic types in beta-eta normal form. +expressions of basic types in beta-eta normal form. Linearization defined in concrete -syntax applies to all and only these expressions. +syntax applies to all and only these expressions.
There is also a direct definition of syntax trees, which does not @@ -2110,7 +2108,7 @@ where Ai are types and B is a basic type, a syntax tree is an expr
-Thus functions are not concrete syntax objects; however, the +Thus functions are not concrete syntax objects; however, the mappings themselves are expressed as functions, and the source code of a concrete syntax can use functions under the condition that they can be eliminated from the final compiled grammar (which they -can; this is one of the fundamental properties of compilation, as +can; this is one of the fundamental properties of compilation, as explained in more detail in the JFP article).
@@ -2186,20 +2184,20 @@ Concrete syntax thus has the same function types and expression forms as
abstract syntax, specified here. The basic types defined
by categories (cat judgements) are available via grammar reuse
explained here; this also comprises the
-predefined categories Float and String.
+predefined categories Float and String.
In abstract syntax, the conversion rules fiven here define a computational relation -among expressions, but there is no separate notion of a value of +among expressions, but there is no separate notion of a value of computation: the value (the end point) of a computation chain is simply an expression to which no more conversions apply. In general, we are interested in expressions that satisfy the conditions of being syntax trees (as defined here), but there can be many computationally equivalent syntax trees which nonetheless are distinct syntax trees -and hence have different linearizations. The main use of computation +and hence have different linearizations. The main use of computation in abstract syntax is to compare types in dependent type checking.
@@ -2227,7 +2225,7 @@ variables x1,...,xn in where
fun f :
+fun f :
(x1 : A1) -> ... -> (xn : An) -> B
s + t), defined here
-Str arguments)
Str have the following canonical forms:
"foo"
[]
++ t, where s,t : Str
-pre { s ; s1 / p1 ; ... ; sn / pn}, where
Str
@@ -2275,14 +2273,14 @@ Expressions of type Str have the following canonical forms:
For convenience, the notation is overloaded so that tokens are identified with singleton token lists, and there is no separate type of tokens -(this is a change from the JFP article). +(this is a change from the JFP article). The notion of a token is still important for compilation: all tokens introduced by the grammar must be known at compile time. This, in turn, is required by the parsing algorithms used for parsing with GF grammars.
-In addition to string literals, tokens can be formed by a specific +In addition to string literals, tokens can be formed by a specific non-canonical operator:
-Since tokens must be known at compile time, +Since tokens must be known at compile time, the operands of gluing may not depend on run-time variables, as defined here.
@@ -2317,7 +2315,7 @@ spaces separate tokens:
-Notice that there are no empty tokens, but the expression []
+Notice that there are no empty tokens, but the expression []
can be used in a context requiring a token, in particular in gluing expression
below. Since [] denotes an empty token list, the following computation laws
are valid:
@@ -2336,7 +2334,7 @@ Moreover, concatenation and gluing are associative:
-For the programmer, associativity and the empty token laws mean +For the programmer, associativity and the empty token laws mean that the compiler can use them to simplify string expressions. It also means that these laws are respected in pattern matching on strings. @@ -2355,14 +2353,14 @@ This expression can be computed in the context of a subsequent token:
pre { s ; s1 / p1 ; ... ; sn / pn} ++ t
==>
-The matching prefix is defined by comparing the string with the prefix of +The matching prefix is defined by comparing the string with the prefix of the token. If the prefix is a variant list of strings, then it matches the token if any of the strings in the list matches it.
@@ -2373,11 +2371,11 @@ they are not given a subsequent token to compare with, or because the subsequent token depends on a run-time variable.-The prefix-dependent choice expression itself may not depend on run-time +The prefix-dependent choice expression itself may not depend on run-time variables.
-In GF prior to 3.0, a specific type Strs
+In GF prior to 3.0, a specific type Strs
is used for defining prefixes,
instead of just variants of Str.
The order of fields in record types and records is insignificant: two record -types (or records) are equal if they have the same fields, in any order, and a +types (or records) are equal if they have the same fields, in any order, and a record is an object of a record type, if it has type-correct value assignments -for all fields of the record type. -The latter definition implies the even stronger +for all fields of the record type. +The latter definition implies the even stronger principle of record subtyping: a record can have any type that has some -subset of its fields. This principle is explained further +subset of its fields. This principle is explained further here.
@@ -2420,12 +2418,12 @@ All fields in a record must have distinct labels. Thus it is not possible e.g. to "redefine" a field "later" in a record.
-Lexically, labels are identifiers (defined here).
+Lexically, labels are identifiers (defined here).
This is with the exception
of the labels selecting bound variables in the linearization of higher-order
abstract syntax, which have the form $i for an integer i,
-as specified here.
-In source code, these labels should not appear in records fields,
+as specified here.
+In source code, these labels should not appear in records fields,
but only in selections.
@@ -2447,12 +2445,12 @@ The computation rule for projection returns the value assigned to that field:
{ ... ; r = a ; ... }.r ==> a
Notice that the dot notation t.r is also used for qualified names
-as specified here.
+as specified here.
This ambiguity follows tradition and convenience. It is
resolved by the following rules (before type checking):
** { r : A } is a subtype of R
-> A is a subtype of C -> B
--> C is a subtype of A -> C
-The logic of these rules is natural: if a function is returns a value -in a subtype, then this value is a fortiori in the supertype. +The logic of these rules is natural: if a function is returns a value +in a subtype, then this value is a fortiori in the supertype. If a function is defined for some type, then it is a fortiori defined for any subtype.
@@ -2551,7 +2549,7 @@ contravariance, GF implements subtyping for initial segments of integers: As the last rule, subtyping is transitive:
A table type has the form
@@ -2580,11 +2578,11 @@ Canonical expressions of table types are tables, of the form
table { V1 => t1 ; ... ; Vn => tn }
where V1,...,Vn is the complete list of the parameter values of
-the argument type P (defined here), and each ti is
+the argument type P (defined here), and each ti is
an expression of the value type T.
-In addition to explicit enumerations, +In addition to explicit enumerations, tables can be given by pattern matching,
table {p1 => t1 ; ... ; pm => tm}
@@ -2625,11 +2623,11 @@ patterns following the enumeration of all values of the argument type,
this order no longer matters, because no overlap remains between patterns.
-The GF compiler performs table expansion, i.e. an analogue of +The GF compiler performs table expansion, i.e. an analogue of eta expansion defined here, where a table is applied to all values to its argument type:
=> T ==>
+t : P => T ==>
table P [t ! V1 ; ... ; t ! Vn]
Integer and Str.
{ r1 = p1 ; ... ; rn = pn }
binds the union of all variables bound in the subpatterns
- p1,...,pn.
- It matches any value
+ p1,...,pn.
+ It matches any value
{ r1 = V1 ; ... ; rn = Vn ; ...}
- where each pi# matches Vi,
+ where each pi# matches Vi,
and the matching substitution is the union of these substitutions.
-_ binds no variables.
It matches any value, with the empty substitution.
| q binds the intersection of
the variables bound by p and q.
- It matches anything that
+ It matches anything that
either p or q matches, with the first substitution starting
with p matches, from which those
variables that are not bound by both patterns are removed.
@@ -2712,7 +2710,7 @@ The following patterns are only available for the type Str:
-The following pattern is only available for the types Integer
+The following pattern is only available for the types Integer
and Ints n:
Pattern matching is performed in the order in which the branches appear in the source code: the branch of the first matching pattern is followed. -In concrete syntax, the type checker reject sets of patterns that are +In concrete syntax, the type checker reject sets of patterns that are not exhaustive, and warns for completely overshadowed patterns. It also checks the type correctness of patterns with respect to the argument type. In abstract syntax, only type correctness is checked, no exhaustiveness or overshadowing.
-It follows from the definition of record pattern matching +It follows from the definition of record pattern matching that it can utilize partial records: the branch
@@ -2767,7 +2765,7 @@ An expressions of the form
variants {}
@@ -2777,7 +2775,7 @@ thing, e.g. that a certain inflectional form does not exist.
A common wisdom in linguistics is that "there is no free variation", which -refers to the situation where all aspects are taken into account. For +refers to the situation where all aspects are taken into account. For instance, the English negation contraction could be expressed as free variation,
@@ -2794,7 +2792,7 @@ informal and formal style:Since there is not way to choose a particular element from a ``variants` list, free variants is normally not adequate in libraries, nor in grammars meant for -natural language generation. In application grammars +natural language generation. In application grammars meant to parse user input, free variation is a way to avoid cluttering the abstract syntax with semantically insignificant distinctions and even to tolerate some grammatical errors. @@ -2804,7 +2802,7 @@ Permitting
variantsin all types involves a major modification of t semantics of GF expressions. All computation rules have to be lifted to deal with lists of expressions and values. For instance,-t This is done in such a way that @@ -2844,7 +2842,7 @@ Computation is performed by substituting t for x in e:!variants{t1 ; ... ; tn}==> +t!variants{t1 ; ... ; tn}==>variants{t!t1 ; ... ; t!tn}-As syntactic sugar, the type can be omitted if the type checker is +As syntactic sugar, the type can be omitted if the type checker is able to infer it: letx : T = tine ==> e {x = t}letx = tine @@ -2877,27 +2875,27 @@ to bind variables on the left of the equality sign.Fully compiled concrete syntax may not include expressions of function types -except on the outermost level of
linrules, as defined here. -However, +except on the outermost level oflinrules, as defined here. +However, in the source code, and especially inoperdefinitions, functions are the main vehicle of code reuse and abstraction. Thus function types and functions follow the same rules as in abstract syntax, as specified -here. In +here. In particular, the application of a lambda abstract is computed by beta conversion.To ensure the elimination of functions, GF uses a special computation rule -for pushing function applications inside tables, since otherwise run-time +for pushing function applications inside tables, since otherwise run-time variables could block their applications:
-( -Also parameter constructors with non-empty contexts, as defined -here, +Also parameter constructors with non-empty contexts, as defined +here, result in expressions in application form. These expressions are never a problem if their arguments are just constructors, because they can then be translated to integers corresponding to the position of the expression @@ -2905,10 +2903,10 @@ in the enumaration of the values of its type. However, a constructor applied to a run-time variable may need to be converted as follows:table{p1=>f1 ; ... ; - pn=>fn}!e) a +(table{p1=>f1 ; ... ; + pn=>fn}!e) a ==> -table{p1=>f1 a ; ... ; - pn=>fn a}!e +table{p1=>f1 a ; ... ; + pn=>fn a}!e-C...x... ==> -The resulting expression, when processed by table expansion as explained -here, +The resulting expression, when processed by table expansion as explained +here, results in C being applied to just values of the type of x, and the application thereby disappears. @@ -2922,7 +2920,7 @@ application thereby disappears. discipline of GF 2.8.casex of{_ =>C...x}+C...x... ==>casex of{_ =>C...x}-As explained here, +As explained here, abstract syntax modules can be opened as interfaces and concrete syntaxes as their instances. This means that judgements are, as it were, translated in the following way: @@ -2962,7 +2960,7 @@ is available:
In object-oriented terms, the type C itself is protected, whereas MkC is a public constructor of C. Of course, it is possible to -make these constructors overloaded (concept explained here), +make these constructors overloaded (concept explained here), to enable easy access to special cases.
@@ -2983,7 +2981,7 @@ The following concrete syntax types are predefined:The last two types are, in a way, extended by user-written grammars, -since new parameter types can be defined in the way shown here, +since new parameter types can be defined in the way shown here, and every paramater type is also a type. From the point of view of the values of expressions, however, a
paramdeclaration does not extendPType, since all parameter types get compiled to initial @@ -3008,7 +3006,7 @@ literals).The following predefined operations are defined in the resource module
prelude/Predef.gf. Their implementations are defined as -a part of the GF grammar compiler. +a part of the GF grammar compiler.