diff --git a/.gitignore b/.gitignore index 1e25373d7..6025f5a4b 100644 --- a/.gitignore +++ b/.gitignore @@ -54,7 +54,7 @@ doc/gf-bibliography.html doc/gf-developers.html doc/gf-editor-modes.html doc/gf-people.html -doc/gf-reference.html +doc/gf-refman.html doc/gf-shell-reference.html doc/icfp-2012.html download/*.html diff --git a/bin/update_html b/bin/update_html index 84bfa1cdb..8ba13778a 100755 --- a/bin/update_html +++ b/bin/update_html @@ -93,10 +93,18 @@ function render_md_html { html="$2" relroot="$( dirname $md | sed -E 's/^.\///' | sed -E 's/[^/]+/../g' )" + # Look for `toc: true` in metadata (first ten lines of file) + if head -n 10 "$md" | grep --quiet 'toc: true' ; then + toc='--table-of-contents' + else + toc='' + fi + pandoc \ --from=markdown \ --to=html5 \ --standalone \ + $toc \ --template="$template" \ --variable="lang:en" \ --variable="rel-root:$relroot" \ diff --git a/doc/gf-refman.html b/doc/gf-refman.html deleted file mode 100644 index 31456b09a..000000000 --- a/doc/gf-refman.html +++ /dev/null @@ -1,4622 +0,0 @@ - - - - GF Language Reference Manual - - - - - - -
- -

-

- - GF Logo - -

GF Language Reference Manual

- -Aarne Ranta, Krasimir Angelov
June 2014, GF 3.6 -
- -

- - -

-
-

-

- -

-

-This document is a reference manual to the GF programming language. -GF, Grammatical Framework, is a special-purpose programming language, -designed to support definitions of grammars. -

-

-This document is not an introduction to GF; such introduction can be -found in the GF tutorial available on line on the GF web page, -

-

-grammaticalframework.org -

-

-This manual covers only the language, not the GF compiler or -interactive system. We will however make some references to different -compiler versions, if they involve changes of behaviour having to -do with the language specification. -

-

-This manual is meant to be fully compatible with GF version 3.0. -Main discrepancies with version 2.8 are indicated, -as well as with the reference article on GF, -

-

-A. Ranta, "Grammatical Framework. A Type Theoretical Grammar Formalism", -The Journal of Functional Programming 14(2), 2004, pp. 145-189. -

-

-This article will referred to as "the JFP article". -

-

-As metalinguistic notation, we will use the symbols -

- - - -

Overview of GF

-

-GF is a typed functional language, -borrowing many of its constructs from ML and Haskell: algebraic datatypes, -higher-order functions, pattern matching. The module system bears resemblance -to ML (functors) but also to object-oriented languages (inheritance). -The type theory used in the abstract syntax part of GF is inherited from -logical frameworks, in particular ALF ("Another Logical Framework"; in a -sense, GF is Yet Another ALF). From ALF comes also the use of dependent -types, including the use of explicit type variables instead of -Hindley-Milner polymorphism. -

-

-The look and feel of GF is close to Java and -C, due to the use of curly brackets and semicolons in structuring the code; -the expression syntax, however, follows Haskell in using juxtaposition for -function application and parentheses only for grouping. -

-

-To understand the constructs of GF, and especially their limitations in comparison -to general-purpose programming languages, it is essential to keep in mind that -GF is a special-purpose and non-turing-complete language. Every GF program is -ultimately compiled to a multilingual grammar, which consists of an -abstract syntax and a set of concrete syntaxes. The abstract syntax -defines a system of syntax trees, and each concrete syntax defines a -mapping from those syntax trees to nested tuples of strings and integers. -This mapping is compositional, i.e. homomorphic, and moreover -reversible: given a nested tuple, there exists an effective way of finding -the set of syntax trees that map to this tuple. The procedure of applying the -mapping to a tree to produce a tuple is called linearization, and the -reverse search procedure is called parsing. It is ultimately the requirement -of reversibility that restricts GF to be less than turing-complete. This is -reflected in restrictions to recursion in concrete syntax. Tree formation in -abstract syntax, however, is fully recursive. -

-

-Even though run-time GF grammars manipulate just nested tuples, at compile -time these are represented by by the more fine-grained labelled records -and finite functions over algebraic datatypes. This enables the programmer -to write on a higher abstraction level, and also adds type distinctions -and hence raises the level of checking of programs. -

- -

The module system

- -

Top-level and supplementary module structure

-

-The big picture of GF as a programming language for multilingual grammars -explains its principal module structure. Any GF grammar must have an -abstract syntax module; it can in addition have any number of concrete -syntax modules matching that abstract syntax. Before going to details, -we give a simple example: a module defining the category A -of adjectives and one adjective-forming function, the zero-place function -Even. We give the module the name Adj. The GF code for the -module looks as follows: -

-
-    abstract Adj = {
-      cat A ;
-      fun Even : A ;
-    }
-
-

-Here are two concrete syntax modules, one intended for mapping the trees -to English, the other to Swedish. The mappling is defined by -lincat definitions assigning a linearization type to each category, -and lin definitions assigning a linearization to each function. -

-
-    concrete AdjEng of Adj = {
-      lincat A = {s : Str} ;
-      lin Even = {s = "even"} ;
-    }
-
-    concrete AdjSwe of Adj = {
-      lincat A = {s : AForm => Str} ;
-      lin Even = {s = table {
-        ASg Utr   => "jämn" ;
-        ASg Neutr => "jämnt" ;
-        APl       => "jämna"
-        }
-      } ;
-      param AForm = ASg Gender | APl ;
-      param Gender = Utr | Neutr ;
-    }
-
-

-These examples illustrate the main ideas of multilingual grammars: -

- - - - - - -

-The first two ideas form the core of the static checking of GF -grammars, eliminating the possibility of run-time errors in -linearization and parsing. The third idea gives GF the expressive -power needed to map abstract syntax to vastly different languages. -

-

-Abstract and concrete modules are called top-level grammar modules, -since they are the ones that remain in grammar systems at run time. -However, in order to support modular grammar engineering, GF provides -much more module structure than strictly required in top-level grammars. -

-

-Inheritance, also known as extension, means that a module can inherit the -contents of one or more other modules to which new judgements are added, -e.g. -

-
-    abstract MoreAdj = Adj ** {
-      fun Odd : A ;
-    }
-
-

-Resource modules define parameter types and operations usable -in several concrete syntaxes, -

-
-    resource MorphoFre = {
-      param Number = Sg | Pl ;
-      param Gender = Masc | Fem ;
-      oper regA : Str -> {s : Gender => Number => Str} =
-        \fin -> {
-          s = table {
-            Masc => table {Sg => fin ; Pl => fin + "s"} ;
-            Fem  => table {Sg => fin + "e" ; Pl => fin + "es"}
-          }
-        } ;
-    }
-
-

-By opening, a module can use the contents of a resource module -without inheriting them, e.g. -

-
-    concrete AdjFre of Adj = open MorphoFre in {
-      lincat A = {s : Gender => Number => Str} ;
-      lin Even = regA "pair" ;
-    }
-
-

-Interfaces and instances separate the contents of a resource module -to type signatures and definitions, in a way analogous to abstract vs. concrete -modules, e.g. -

-
-    interface Lexicon = {
-      oper Adjective : Type ;
-      oper even_A : Adjective ;
-    }
-
-    instance LexiconEng of Lexicon = {
-      oper Adjective = {s : Str} ;
-      oper even_A = {s = "even"} ;
-    }
-
-

-Functors i.e. parametrized modules i.e. incomplete modules, defining -a concrete syntax in terms of an interface. -

-
-    incomplete concrete AdjI of Adj = open Lexicon in {
-      lincat A = Adjective ;
-      lin Even = even_A ;
-    }
-
-

-A functor can be instantiated by providing instances of its open interfaces. -

-
-    concrete AdjEng of Adj = AdjI with (Lexicon = LexiconEng) ;
-
-

- -

Compilation units

-

-The compilation unit of GF source code is a file that contains a module. -Judgements outside modules are supported only for backward compatibility, -as explained here. -Every source file, suffixed .gf, is compiled to a "GF object file", -suffixed .gfo (as of GF Version 3.0 and later). For runtime grammar objects -used for parsing and linearization, a set of .gfo files is linked to -a single file suffixed .pgf. While .gf and .gfo files may contain -modules of any kinds, a .pgf file always contains a multilingual grammar -with one abstract and a set of concrete syntaxes. -

-

-The following diagram summarizes the files involved in the compilation process. -

-module1.gf module2.gf ... modulen.gf -

-

-==> -

-

-module1.gfo module2.gfo ... modulen.gfo -

-

-==> -

-

-grammar.pgf -

-Both .gf and .gfo files are written in the GF source language; -.pgf files are written in a lower-level format. The process of translating -.gf to .gfo consists of name resolution, type annotation, -partial evaluation, and optimization. -There is a great advantage in the possibility to do this -separately for GF modules and saving the result in .gfo files. The partial -evaluation phase, in particular, is time and memory consuming, and GF libraries -are therefore distributed in .gfo to make their use less arduous. -

-

-In GF before version 3.0, the object files are in a format called .gfc, -and the multilingual runtime grammar is in a format called .gfcm. -

-

-The standard compiler has a built-in make facility, which finds out what -other modules are needed when compiling an explicitly given module. -This facility builds a dependency graph and decides which of the involved -modules need recompilation (from .gf to .gfo), and for which the -GF object can be used directly. -

- -

Names

-

-Each module M defines a set of names, which are visible in M -itself, in all modules extending M (unless excluded, as explained -here), and -all modules opening M. These names can stand for abstract syntax -categories and functions, parameter types and parameter constructors, -and operations. All these names live in the same name space, which -means that a name entering a module more than once due to inheritance or -opening can lead to a conflict. It is specified -here how these -conflicts are resolved. -

-

-The names of modules live in a name space separate from the other names. -Even here, all names must be distinct in a set of files compiled to a -multilingual grammar. In particular, even files residing in different directories -must have different names, since GF has no notion of hierarchic -module names. -

-

-Lexically, names belong to the class of identifiers. An idenfifier is -a letter followed by any number of letters, digits, undercores (_) and -primes ('). Upper- and lower-case letters are treated as distinct. -Nothing dictates the choice of upper or lower-case initials, but -the standard libraries follow conventions similar to Haskell: -

- - -

- -

-

-"Letters" as mentioned in the identifier syntax include all 7-bit ASCII -letters. Iso-latin-1 and Unicode letters are supported in varying degrees -by different tools and platforms, and are hence not recommended in identifiers. -

- -

The structure of a module

-

-Modules of all types have the following structure: -

-moduletype name = extends opens body -
-The part of the module preceding the body is its header. The header -defines the type of the module and tells what other modules it inherits -and opens. The body consists of the judgements that introduce all the new -names defined by the module. -

-

-Any of the parts extends, opens, and body may be empty. -If they are all filled, delimiters and keywords separate the parts in the -following way: -

-moduletype name = - extends ** open opens in { body } -
-The part moduletype name looks slightly different if the -type is concrete or instance: the name intrudes between -the type keyword and the name of the module being implemented and which -really belongs to the type of the module: -
- concrete name of abstractname -
-The only exception to the schema of functor syntax -is functor instantiations: the instantiation -list is given in a special way between extends and opens: -
-incomplete concrete name of abstractname = - extends ** functorname with instantiations ** - open opens in { body } -
-Logically, the part "functorname with instantiations" should -really be one of the extends. This is also shown by the fact that -it can have restricted inheritance (concept defined here). -

- -

Module types, headers, and bodies

-

-The extends and opens parts of a module header are lists of -module names (with possible qualifications, as defined below here). -The first step of type checking a module consists of verifying that -these names stand for modules of approptiate module types. As a rule -of thumb, -

- - -

-However, the precise rules are a little more fine-grained, because -of the presence of interfaces and their instances, and the possibility -to reuse abstract and concrete modules as resources. The following table -gives, for all module types, the possible module types of their extends -and opens, as well as the forms of judgement legal in that module type. -

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
module typeextendsopensbody
abstractabstract-cat, fun, def, data
concrete of abstractconcreteresource*lincat, cat, oper, param
resourceresource*resource*oper, param
interfaceresource+resource*oper, param
instance of interfaceresource*resource*oper, param
incomplete concreteconcrete+resource+lincat, cat, oper, param
- -

-

-The table uses the following shorthands for lists of module types: -

- - -

-The legality of judgements in the body is checked before the judgements -themselves are checked. -

-

-The forms of judgement are explained here. -

- -

Digression: the logic of module types

-

-Why are the legality conditions of opens and extends so complicated? The best way -to grasp them is probably to consider a simplified logical model of the module -system, replacing modules by types and functions. This model could actually -be developed towards treating modules in GF as first-class objects; so far, -however, this step has not been motivated by any practical needs. -

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
moduleobject and type
abstract A = BA = B : type
concrete C of A = BC = B : A -> S
interface I = BI = B : type
instance J of I = BJ = B : I
incomplete concrete C of A = open I in BC = B : I -> A -> S
concrete K of A = C with (I=J)K = B(J) : A -> S
resource R = BR = B : I
concrete C of A = open R in BC = B(R) : A -> S
- -

-

-A further step of defining modules as first-class objects would use -GADTs and record types: -

- - -

-Slightly unexpectedly, interfaces and instances are easier to understand -in this way than resources - a resource is, indeed, more complex, since -it fuses together an interface and an instance. -

-

- -

-

-When an abstract is used as an interface and a concrete as its instance, they -are actually reinterpreted so that they match the model. Then the abstract is -no longer a GADT, but a system of abstract datatypes, with a record field -of type Type for each category, and a function among these types for each -abstract syntax function. A concrete syntax instantiates this record with -linearization types and linearizations. -

- -

Inheritance

-

-After checking that the extends of a module are of appropriate -module types, the compiler adds the inherited judgements to the -judgements included in the body. The inherited judgements are -not copied entirely, but their names with links to the inherited module. -Conflicts may arise in this process: a name can have two definitions in the combined -pool of inherited and added judgements. Such a conflict is always an -error: GF provides no way to redefine an inherited constant. -

-

-Simple as the definition of a conflict may sound, it has to take care of the -inheritance hierarchy. A very common pattern of inheritance is the -diamond: inheritance from two modules which themselves inherit a common -base module. Assume that the base module defines a name f: -

-
-            N
-          /   \
-        M1     M2
-          \   /
-          Base {f}
-
-

-Now, N inherits f from both M1 and M2, so is there a -conflict? The answer in GF is no, because the "two" f's are in the -end the same: the one defined in Base. The situation is thus simpler -than in multiple inheritance in languages like C++, because definitions in -GF are immutable: neither M1 nor M2 can possibly have changed -the definition of f given in Base. In practice, the compiler manages -inheritance through hierarchy in a very simple way, by just always creating -a link not to the immediate parent, but the original ancestor; this ancestor -can be read from the link provided by the immediate parent. Here is how -links are created from source modules by the compiler: -

-
-    Base {f}
-    M1 {m1}   ===>  M1 {Base.f, m1}
-    M2 {m2}   ===>  M2 {Base.f, m2}
-    N  {n}    ===>  N  {Base.f, M1.m1, M2.m2, n}
-
-

-

- -

-

-Inheritance can be restricted. This means that a module can be specified -as inheriting only explicitly listed constants, or all constants -except ones explicitly listed. The syntax uses constant names in brackets, -prefixed by a minus sign in the case of an exclusion list. In the following -configuration, N inherits a,b,c from M1, and all names but d -from M2 -

-
-    N = M1 {a,b,c}, M2-{d}
-
-

-Restrictions are performed as a part of inheritance linking, module by module: -the link is created for a constant if and only if it is both -included in the module and compatible with the restriction. Thus, -for instance, an inadvertent usage can exclude a constant from one module -but inherit it from another one. In the following -configuration, f is inherited via M1, if M1 inherits it. -

-
-    N = M1 [a,b,c], M2-[f]
-
-

-Unintended inheritance may cause problems later in compilation, in the -judgement-level dependency analysis phase. For instance, suppose a function -f has category C as its type in M, and we only include f. The -exclusion has the effect of creating an ill-formed module: -

-
-    abstract M = {cat C ; fun f : C ;}
-    M [f]   ===> {fun f : C ;}
-
-

-One might expect inheritance restriction to be transitive: if an included -constant b depends on some other constant a, then a should be -included automatically. However, this rule would leave to hard-to-detect -inheritances. And it could only be applied later in the compilation phase, -when the compiler has not only collected the names defined, but also -resolved the names used in definitions. -

-

-Yet another pitfall with restricted inheritance is that it must be stated -for each module separately. For instance, a concrete syntax of an abstract -must exclude all those names that the abstract does, and a functor instantiation -must replicate all restrictions of the functor. -

- -

Opening

-

-Opening makes constants from other modules usable in judgements, without -inheriting them. This means that, unlike inheritance, opening is not -transitive. -

-

- -

-

-Opening cannot be restricted as inheritance can, but it can be qualified. -This means that the names from the opened modules cannot be used as such, but -only as prefixed by a qualifier and a dot (.). The qualifier can be any -identifier, including the name of the module. Here is an example of -an opens list: -

-
-    open A, (X = XSLTS), (Y = XSLTS), B
-
-

-If A defines the constant a, it can be accessed by the names -

-
-    a  A.a
-
-

-If XSLTS defines the constant x, it can be accessed by the names -

-
-    X.x  Y.x  XSLTS.x
-
-

-Thus qualification by real module name is always possible, and one and the same -module can be qualified in different ways at the same time (the latter can -be useful if you want to be able to change the implementations of some -constants to a different resource later). Since the qualification with real -module name is always possible, it is not possible to "swap" the names of -modules locally: -

-
-    open (A=B), (B=A) -- NOT POSSIBLE!
-
-

-The list of qualifiers names and module names in a module header may -thus not contain any duplicates. -

- -

Name resolution

-

- -

-

-Name resolution is the compiler phase taking place after inheritance -linking. It qualifies all names occurring in the definition parts of judgements -(that is, just excluding the defined names themselves) with the names of -the modules they come from. If a name can come from different modules (that is, -not from their common ancestor), a conflict is reported; this decision is -hence not dependent on e.g. types, which are known only at a later phase. -

-

-Qualification of names is the main device for avoiding conflicts in -name resolution. No other information is used, such as priorities between -modules. However, if a name is defined in different opened modules -but never used in the module body, -a conflict does not arise: conflicts arise only -when names are used. Also in this respect, opening is thus different from -inheritance, where conflicts are checked independently of use. -

-

-As usual, inner scope has priority in name resolution. This means that -if an identifier is in scope as a bound variable, it will not be -interpreted as a constant, unless qualified by a module name -(variable bindings are explained here). -

- -

Functor instantiations

-

-We have dealt with the principles of module headers, inheritance, and -names in a general way that applies to all module types. The exception -is functor instantiations, that have an extra part of the instantiating -equations, assigning an instance to every interface. Here is a typical -example, displaying the full generality: -

-
-    concrete FoodsEng of Foods = PhrasesEng **
-      FoodsI-[Pizza] with
-        (Syntax = SyntaxEng),
-        (LexFoods = LexFoodsEng) **
-      open SyntaxEng, ParadigmsEng in {
-        lin Pizza = mkCN (mkA "Italian") (mkN "pie") ;
-    }
-
-

-(The example is modified from Section 5.9 in the GF Tutorial.) -

-

-The instantiation syntax is similar to qualified opens. The left-hand-side -names must be interfaces, the right-hand-side names their instances. (Recall -that abstract can be use as interface and concrete as its -instance.) Inheritance from the functor can be restricted, typically -in the purpose of defining some excluded functions in language-specific -ways in the module body. -

- -

Completeness

-

- -

-

-(This section refers to the forms of judgement introduced here.) -

-

-A concrete is complete with respect to an abstract, if it -contains a lincat definition for every cat declaration, and -a lin definition for every fun declaration. -

-

-The same completeness criterion applies to functor instantiations. -It is not possible to use a partial functor instantiation, leading -to another functor. -

-

-Functors do not need to be complete in the sense concrete modules need. -The missing definitions can then be provided in the body of each -functor instantiation. -

-

-A resource is complete, if all its oper and param judgements -have a definition part. While a resource must be complete, an -interface need not. For an interface, it is the definition -parts of judgements are optional. -

-

-An instance is complete with respect to an interface, if it -gives the definition parts of all oper and param judgements -that are omitted in the interface. Giving definitions to judgements -that have already been defined in the interface is illegal. -Type signatures, on the other hand, can be repeated if the same types -are used. -

-

-In addition to completing the definitions in an interface, -its instance may contain other judgements, but these must all -be complete with definitions. -

-

-Here is an example of an instance and its interface showing the -above variations: -

-
-    interface Pos = {
-      param Case ;                 -- no definition
-      param Number = Sg | Pl ;     -- definition given
-      oper Noun : Type = {         -- relative definition given
-        s : Number => Case => Str
-      } ;
-      oper regNoun : Str -> Noun ; -- no definition
-    }
-
-    instance PosEng of Pos = {
-      param Case = Nom | Gen ;     -- definition of Case
-                                   -- Number and Noun inherited
-      oper regNoun = \dog -> {     -- type of regNoun inherited
-        s = table {                -- definition of regNoun
-          Sg => table {
-            Nom => dog
-            -- etc
-          }
-        } ;
-      oper house_N : Noun =        -- new definition
-        regNoun "house" ;
-    }
-
-

- -

Judgements

- -

Overview of the forms of judgement

-

- -

-

-A module body in GF is a set of judgements. Judgements are -definitions or declarations, sometimes combinations of the two; the -common feature is that every judgement introduces a name, which is -available in the module and whenever the module is extended or opened. -

-

-There are several different forms of judgement, identified by different -judgement keywords. Here is a list of all these forms, together -with syntax descriptions and the types of modules in which each form can occur. -The table moreover indicates whether the judgement has a default value, and -whether it contributes to the name base, i.e. introduces a new -name to the scope. -

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
judgementwheremoduledefaultbase
cat C GG contextabstractN/Ayes
fun f : AA typeabstractN/Ayes
def f ps = tf fun, ps patterns, t termabstractyesno
data C = f | ... | gC cat, f...g funabstractyesno
lincat C = TC cat, T typeconcrete*yesyes
lin f = tf fun, t termconcrete*noyes
lindef C = tC cat, t termconcrete*yesno
linref C = tC cat, t termconcrete*yesno
printname cat C = tC cat, t termconcrete*yesno
printname fun f = tf fun, t termconcrete*yesno
param P = C| ... | DC...D constructorsresource*N/Ayes
oper f : T = tT type, t termresource*N/Ayes
flags o = vo flag, v valueallyesN/A
- -

-

-Judgements that have default values are rarely used, except lincat and -flags, which often need values different from the defaults. -

-

-Introducing a name twice in the same module is an error. In other words, -all judgements that have a "yes" in the name base column, must -have distinct identifiers on their left-hand sides. -

-

-All judgement end with semicolons (;). -

-

-In addition to the syntax given in the table, many of the forms have -syntactic sugar. This sugar will be explained below in connection to -each form. There are moreover two kinds of syntactic sugar common to all forms: -

- - -

-These conventions, like all syntactic sugar, are performed at an -early compilation phase, directly after parsing. This means that e.g. -

-
-    lin f,g = \x -> x ;
-
-

-can be correct even though f and g required different -function types. -

-

-Within a module, judgements can occur in any order. In particular, -a name can be used before it is introduced. -

-

-The explanations of judgement forms refer to the notions -of type and term (the latter also called expression). -These notions will be explained in detail here. -

- -

Category declarations, cat

-

- -

-

-Category declarations -

-cat C G -
-define the basic types of abstract syntax. -A basic type is formed from a category by giving values to all variables -in the context G. If the context is empty, the -basic type looks the same as the category itself. Otherwise, application -syntax is used: -
-C a1...an -
-

- -

Hypotheses and contexts

-

- -

-

-A context is a sequence of hypotheses, i.e. variable-type pairs. -A hypothesis is written -

-( x : T ) -
-and a sequence does not have any separator symbols. As syntactic sugar, -

- - -

-An abstract syntax has dependent types, if any of its categories has -a non-empty context. -

- -

Function declarations, fun

-

-Function declarations, -

- fun f : T -
-define the syntactic constructors of abstract -syntax. The type T of f -is built built from basic types (formed from categories) by using -the function type constructor ->. Thus its form is -
- (x1 : A1) -> ... -> (xn : An) -> B -
-where Ai are types, called the argument types, and B is a -basic type, called the value type of f. The value category of -f is the category that forms the type B. -

-

-A syntax tree is formed from f by applying it to a full list of -arguments, so that the result is of a basic type. -

-

-A higher-order function is one that has a function type as an -argument. The concrete syntax of GF does not support displaying the -bound variables of functions of higher than second order, but they are -legal in abstract syntax. -

-

-An abstract syntax is context-free, if it has neither dependent -types nor higher-order functions. Grammars with context-free abstract -syntax are an important subclass of GF, with more limited complexity -than full GF. Whether the concrete syntax is context-free in the sense -of the Chomsky hierarchy is independent of the context-freeness of -the abstract syntax. -

- -

Function definitions, def

-

-Function definitions, -

- def f p1 ... pn = t -
-where f is a fun function and pi# are patterns, -impose a relation of definitional equality on abstract syntax -trees. They form the basis of computation, which is used -when comparing whether two types are equal; this notion is relevant -only if the types are dependent. Computation can also be used for -the normalization of syntax trees, which applies even in -context-free abstract syntax. -

-

-The set of def definitions for f can be scattered around -the module in which f is introduced as a function. The compiler -builds the set of pattern equations in the order in which the -equations appear; this order is significant in the case of -overlapping patterns. All equations must appear in the same module in -which f itself declared. -

-

-The syntax of patterns will be specified here, commonly for -abstract and concrete syntax. In abstract -syntax, constructor patterns are those of the form -

- C p1 ... pn -
-where C is declared as data for some abstract syntax category -(see next section). A variable pattern is either an identifier or -a wildcard. -

-

-A common pitfall is to forget to declare a constructor as data, which -causes it to be interpreted as a variable pattern in definitions. -

-

-Computation is performed by applying definitions and beta conversions, -and in general by using pattern matching. Computation and pattern matching -are explained commonly for abstract and concrete syntax here. -

-

-In contrast to concrete syntax, abstract syntax computation is -completely symbolic: it does not produce a value, but just another -term. Hence it is not an error to have incomplete systems of -pattern equations for a function. In addition, the definitions -can be recursive, which means that computation can fail to terminate; -this can never happen in concrete syntax. -

- -

Data constructor definitions, data

-

-A data constructor definition, -

- data C = f1 | ... | fn -
-defines the functions f1...fn to be constructors -of the category C. This means that they are recognized as constructor -patterns when used in function definitions. -

-

-In order for the data constructor definition to be correct, -f1...fn must be functions with C as their value category. -

-

-The complete set of constructors for a category C is the union of -all its data constructor definitions. Thus a category can be "extended" -by new constructors afterwards. However, all these constructor definitions -must appear in the same module in which the category is itself defined. -

-

-There is syntactic sugar for declaring a function as a constructor at -the same time as introducing it: -

-data f : A1 -> ... -> An -> C t1 ... tm -

-

- === -

-

-fun f : A1 -> ... -> An -> C t1 ... tm ; - data C = f -

-

- -

The semantic status of an abstract syntax function

-

-There are three possible statuses for a function declared in a fun judgement: -

- - -

-The "constructor" and "defined" statuses are in contradiction with each other, -whereas the primitive notion status is overridden by any of the two others. -

-

-This distinction is relevant for the semantics of abstract syntax, not -for concrete syntax. It shows in the way patterns are treated in -equations in def definitions: a constructor -in a pattern matches only itself, whereas -any other name is treated as a variable pattern, which matches -anything. -

- -

Linearization type definitions, lincat

-

-A linearization type definition, -

- lincat C = T -
-defines the type of linearizations of trees whose type has category C. -Type dependences have no effect on the linearization type. -

-

-The type T must be a legal linearization type, which means that it -is a record type whose fields have either parameter types, the type Str -of strings, or table or record types of these. In particular, function types -may not appear in T. A detailed explanation of types in concrete syntax -will be given here. -

-

-If K is the concrete syntax of an abstract syntax A, then K must -define the linearization type of all categories declared in A. However, -the definition can be omitted from the source code, in which case the default -type {s : Str} is used. -

- -

Linearization definitions, lin

-

-A linearization definition, -

- lin f = t -
-defines the linearizations function of function f, i.e. the function -used for linearizing trees formed by f. -

-

-The type of t must be the homomorphic image of the type of f. -In other words, if -

- fun f : A1 -> ... -> An -> A -
-then -
- lin f : A1* -> ... -> An* -> A* -
-where the type T* is defined as follows depending on T: -

- - -

-The second case is relevant for higher-order functions only. It says that -the linearization type of the value type is extended by adding a string field -for each argument types; these fields store the variable symbol used for -the binding of each variable. -

-

- -

-

-Since the arguments of a function argument are treated as bare strings, -orders higher than the second are irrelevant for concrete syntax. -

-

-There is syntactic sugar for binding the variables of the linearization -of a function on the left-hand side: -

- lin f p = t === lin f = \p -> t -
-The pattern p must be either a variable or a wildcard (_); this is -what the syntax of lambda abstracts (\p -> t) requires. -

- -

Linearization default definitions, lindef

-

- -

-

-A linearization default definition, -

- lindef C = t -
-defines the default linearization of category C, i.e. the function -applicable to a string to make it into an object of the linearization -type of C. -

-

-Linearization defaults are invoked when linearizing variable bindings -in higher-order abstract syntax. A variable symbol is then presented -as a string, which must be converted to correct type in order for -the linearization not to fail with an error. -

-

-The other use of the defaults is for linearizing metavariables -and abstract functions without linearization in the concrete syntax. -In the first case the default linearization is applied to -the string "?X" where X is the unique index -of the metavariable, and in the second case the string is -"[f]" where f is the name of the abstract -function with missing linearization. - -

-

-Usually, linearization defaults are generated by using the default -rule that "uses the symbol itself for every string, and the -first value of the parameter type for every parameter". The precise -definition is by structural recursion on the type: -

- - -

-The notion of the first value of a parameter type (#1(P)) is defined -below. -

- -

Linearization reference definitions, linref

-

- -

-

-A linearization reference definition, -

- linref C = t -
-defines the reference linearization of category C, i.e. the function -applicable to an object of the linearization type of C to make it into a string. -

-

-The reference linearization is always applied to the top-level node -of the abstract syntax tree. For example when we linearize the -tree f x1 x2 .. xn, then we first apply f -to its arguments which gives us an object of the linearization type of -its category. After that we apply the reference linearization -for the same category to get a string out of the object. This -is particularly useful when the linearization type of C -contains discontious constituents. In this case usually the reference -linearization glues the constituents together to produce an -intuitive linearization string. -

-

-The reference linearization is also used for linearizing metavariables -which stand in function position. For example the tree -f (? x1 x2 .. xn) is linearized as follows. Each -of the arguments x1 x2 .. xn is linearized, and after that -the reference linearization of the its category is applied -to the output of the linearization. The result is a sequence of n -strings which are concatenated into a single string. The final string -is the input to the default linearization of the category -for the argument of f. After applying the default linearization -we get an object that we could safely pass to f. -

-

-Usually, linearization references are generated by using the -rule that "picks the first string in the linearization type". The precise -definition is by structural recursion on the type: -

- -Here each call to reference returns either (Just o) or Nothing. -When we compute the reference for a table or a record then we pick -the reference for the first expression for which the recursive call -gives us Just. If we get Nothing for -all of them then the final result is Nothing too. - - -

Printname definitions, printname cat and printname fun

-

-A category printname definition, -

- printname cat C = s -
-defines the printname of category C, i.e. the name used -in some abstract syntax information shown to the user. -

-

-Likewise, a function printname definition, -

- printname fun f = s -
-defines the printname of function f, i.e. the name used -in some abstract syntax information shown to the user. -

-

-The most common use of printnames is in the interactive syntax -editor, where printnames are displayed in menus. It is possible -e.g. to adapt them to each language, or to embed HTML tooltips -in them (as is used in some HTML-based editor GUIs). -

-

-Usually, printnames are generated automatically from the symbol -and/or concrete syntax information. -

- -

Parameter type definitions, param

-

- -

-

-A parameter type definition, -

- param P = C1 G1 | ... | Cn Gn -
-defines a parameter type P with the parameter constructors -C1...Cn, with their respective contexts G1...Gn. -

-

- -

-

-Contexts have the same syntax as in cat judgements, explained -here. Since dependent types are not available in -parameter type definitions, the use of variables is never -necessary. The types in the context must themselves be parameter types, -which are defined as follows: -

- - -

-The names defined by a parameter type definition include both the -type name P and the constructor names Ci. Therefore all these -names must be distinct in a module. -

-

-A parameter type may not be recursive, i.e. P itself may not occur in -the contexts of its constructors. This restriction extends to mutual -recursion: we say that P depends on the types that occur -in the contexts of its constructors and on all types that those types -depend on, and state that P may not depend on itself. -

-

-In an interface module, it is possible to declare a parameter type -without defining it, -

- param P ; -
-

- -

Parameter values

-

- -

-

-All parameter types are finite, and the GF compiler will internally -compute them to lists of parameter values. These lists are formed by -traversing the param definitions, usually respecting the -order of constructors in the source code. For records, bibliographical -sorting is applied. However, both the order of traversal of param -definitions and the order of fields in a record are specified -in a compiler-internal way, which means that the programmer should not -rely on any particular order. -

-

-The order of the list of parameter values can affect the program in two -cases: -

- - -

-The first usage implies that, if lindef definitions are essential for -the application, they should be given manually. The second usage implies that -course-of-value tables should be avoided in hand-written GF code. -

-

-In run-time grammar generation, all parameter values are translated to -integers denotions positions in these parameter lists. -

- -

Operation definitions, oper

-

-An operation definition, -

- oper h : T = t -
-defines an operation h of type T, with the computation rule -
- h ==> t -
-The type T can be any concrete syntax type, including function -types of any order. The term t must have the type T, as -defined here. -

-

-As syntactic sugar, the type can be omitted, -

- oper h = t -
-which works in two cases -

- - -

-It is also possible to give the type and the definition separately: -

-oper h : T ; oper h = t === - oper h : T = t -
-The order of the type part and the definition part is free, and there -can be other judgements in between. However, they must occur in the -same resource module for it to be complete (as defined here). -In an interface module, it is enough to give the type. -

-

-When only the definition is given, it is possible to use a shorthand -similar to lin judgements: -

-oper h p = t === oper h = \p -> t -
-The pattern p is either a variable or a wildcard (_). -

-

-Operation definitions may not be recursive, not even mutually recursive. -This condition ensures that functions can in the end be eliminated from -concrete syntax code (as explained here). -

- -

Operation overloading

-

- -

-

-One and the same operation name h can be used for different operations, -which have to have different types. For each call of h, the type checker -selects one of these operations depending on what type is expected in the -context of the call. The syntax of overloaded operation definitions is -

-oper h - = overload {h : T1 = t1 ; ... ; h : Tn = tn} -
-Notice that h must be the same in all cases. -This format can be used to give the complete implementation; to give just -the types, e.g. in an interface, one can use the form -
-oper h - : overload {h : T1 ; ... ; h : Tn} -
-The implementation of this operation typing is given by a judgement of -the first form. The order of branches need not be the same. -

- -

Flag definitions, flags

-

-A flag definition, -

- flags o = v -
-sets the value of the flag o, to be used when compiling or using -the module. -

-

-The flag o is an identifier, and the value v is either an identifier -or a quoted string. -

-

-Flags are a kind of metadata, which do not strictly belong to the GF -language. For instance, compilers do not necessarily check the -consistency of flags, or the meaningfulness of their values. -The inheritance of flags is not well-defined; the only certain rule -is that flags set in the module body override the settings from -inherited modules. -

-

-Here are some flags commonly included in grammars. -

- - - - - - - - - - - - - - - - - - - -
flagvaluedescriptionmodule
codingcharacter encodingencoding used in string literalsconcrete
startcatcategorydefault target of parsingabstract
- -

-

-The possible values of these flags are -specified here. Note that -the lexer and unlexer flags are -deprecated. If you need their functionality, you should use supply -them to GF shell commands like so: - -

put_string -lextext "страви, напої" | parse
- -A summary of their possible values can be found at the GF shell - reference. -

-

- -

Types and expressions

- -

Overview of expression forms

-

- -

-

-Like many dependently typed languages, GF makes no syntactic distinction -between expressions and types. An illegal use of a type as an expression or -vice versa comes out as a type error. Whether a variable, for instance, -stands for a type or an expression value, can only be resolved from its -context of use. -

-

-One practical consequence of the common syntax is that global and local definitions -(oper judgements and let expressions, respectively) work in the same way -for types and expressions. Thus it is possible to abbreviate a type -occurring in a type expression: -

-
-    let A = {s : Str ; b : Bool} in A -> A -> A
-
-

-Type and other expressions have a system of precedences. The following table -summarizes all expression forms, from the highest to the lowest precedence. -Some expressions are moreover left- or right-associative. -

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
precexpression exampleexplanation
7cconstant or variable
7Typethe type of types
7PTypethe type of parameter types
7Strthe type of strings/token lists
7"foo"string literal
7123integer literal
70.123floating point literal
7?metavariable
7[]empty token list
7[C a b]list category
7["foo bar"]token list
7{"s : Str ; n : Num}record type
7{"s = "foo" ; n = Sg}record
7<Sg,Fem,Gen>tuple
7<n : Num>type-annotated expression
6 leftt.rprojection or qualification
5 leftf afunction application
5table {Sg => [] ; _ => "xs"}table
5table P [a ; b ; c]course-of-values table
5case n of {Sg => [] ; _ => "xs"}case expression
5variants {"color" ; "colour"}free variation
5pre {vowel => "an" ; _ => "a"}prefix-dependent choice
4 leftt ! vtable selection
4 leftA * Btuple type
4 leftR ** {b : Bool}record (type) extension
3 leftt + stoken gluing
2 leftt ++ stoken list concatenation
1 right\x,y -> tfunction abstraction ("lambda")
1 right\\x,y => ttable abstraction
1 right(x : A) -> Bdependent function type
1 rightA -> Bfunction type
1 rightP => Ttable type
1 rightlet x = v in tlocal definition
1t where {x = v}local definition
1in M.C "foo"rule by example
- -

-

-Any expression in parentheses ((exp)) is in the highest -precedence class. -

- -

The functional fragment: expressions in abstract syntax

-

- -

-

-The expression syntax is the same in abstract and concrete syntax, although -only a part of the syntax is actually usable in well-typed expressions in -abstract syntax. An abstract syntax is essentially used for defining a set -of types and a set of functions between those types. Therefore it needs -essentially the functional fragment -of the syntax. This fragment comprises two kinds of types: -

- - - - -

-When defining basic types, we used the notation -t{x1 = t1,...,xn=tn} -for the substitution of values to variables. This is a metalevel notation, -which denotes a term that is formed by replacing the free occurrences of -each variable xi by ti. -

-

-These types have six kinds of expressions: -

- - - - - - - - - - - - -

- -

-

-The notion of binding is defined for occurrences of variables in -subexpressions as follows: -

- - -

-As syntactic sugar, function types have sharing of types and -suppression of variables, in the same way as contexts -(defined here): -

- - -

-There is analogous syntactic sugar for constant functions, -

-\_ -> t === \x -> t -
-where x does not occur in t, and for multiple lambda abstractions: -
-\p,q -> t === \p -> \q -> t -
-where p and q are variables or wild cards (_). -

- -

Conversions

-

- -

-

-Among expressions, there is a relation of definitional equality defined -by four conversion rules: -

- - - - - - - - -

-Pattern matching substitution used in delta conversion -is defined here. -

-

-An expression is in beta-eta-normal form if -

- - -

-Notice that the iteration of eta expansion would lead to an expression not -in beta-normal form. -

- -

Syntax trees

-

- -

-

-The syntax trees defined by an abstract syntax are well-typed -expressions of basic types in beta-eta normal form. -Linearization defined in concrete -syntax applies to all and only these expressions. -

-

-There is also a direct definition of syntax trees, which does not -refer to beta and eta conversions: keeping in mind that a type always has -the form -

-(x1 : A1) -> ... -> (xn : An) -> B -
-where Ai are types and B is a basic type, a syntax tree is an expression -
-b t1 ... tn : B' -
-where -

- - - -

Predefined types in abstract syntax

-

- -

-

-GF provides three predefined categories for abstract syntax, with predefined -expressions: -

- - - - - - - - - - - - - - - - - -
categoryexpressions
Intinteger literals, e.g. 123
Floatfloating point literals, e.g. 12.34
Stringstring literals, e.g. "foo"
- -

-

-These categories take no arguments, and they can be used as basic -types in the same way as if they were introduced in cat judgements. -However, it is not legal to define fun functions that have any -of these types as value type: their only well-typed expressions are -literals as defined in the above table. -

- -

Overview of expressions in concrete syntax

-

- -

-

-Concrete syntax is about defining mappings from abstract syntax trees -to concrete syntax objects. These objects comprise -

- - -

-Thus functions are not concrete syntax objects; however, the -mappings themselves are expressed as functions, and the source code -of a concrete syntax can use functions under the condition that -they can be eliminated from the final compiled grammar (which they -can; this is one of the fundamental properties of compilation, as -explained in more detail in the JFP article). -

-

-Concrete syntax thus has the same function types and expression forms as -abstract syntax, specified here. The basic types defined -by categories (cat judgements) are available via grammar reuse -explained here; this also comprises the -predefined categories Float and String. -

- -

Values, canonical forms, and run-time variables

-

-In abstract syntax, the conversion rules fiven here -define a computational relation -among expressions, but there is no separate notion of a value of -computation: the value (the end point) of a computation chain is -simply an expression to which no more conversions apply. In general, -we are interested in expressions that satisfy the conditions of being -syntax trees (as defined here), but there can be many computationally -equivalent syntax trees which nonetheless are distinct syntax trees -and hence have different linearizations. The main use of computation -in abstract syntax is to compare types in dependent type checking. -

-

-In concrete syntax, the notion of values is central. At run time, -we want to compute the values of linearizations; at compile time, we want -to perform partial evaluation, which computes expressions as far as -possible. -To specify what happens -in computation we therefore have to distinguish between canonical forms -and other forms of expressions. The canonical forms are defined separately -for each form of type, whereas the other forms may usually produce expressions -of any type. -

-

- - -

-

-What is done at compile time is the elimination of any noncanonical forms, -except for those depending on run-time variables. Run-time variables are -the same as the argument variables of linearization rules, i.e. the -variables x1,...,xn in -

-lin f = \ x1,...,xn -> t -
-where -
-fun f : -(x1 : A1) -> ... -> (xn : An) -> B -
-Notice that this definition refers to the eta-expanded linearization term, -which has one abstracted variable for each argument type of f. These variables -are not necessarily explicit in GF source code, but introduced by the compiler. -

-

-Since certain expression forms should be eliminated in compilation but -cannot be eliminated if run-time variables appear in them, errors can -appear late in compilation. This is an issue with the following -expression forms: -

- - - -

Token lists, tokens, and strings

-

- -

-

-The most prominent basic type is Str, the type of token lists. -This type is often sloppily referred to as the type of strings; -but it should be kept in mind that the objects of Str are -lists of strings rather than single strings. -

-

-Expressions of type Str have the following canonical forms: -

- - -

-For convenience, the notation is overloaded so that tokens are identified -with singleton token lists, and there is no separate type of tokens -(this is a change from the JFP article). -The notion of a token -is still important for compilation: all tokens introduced by -the grammar must be known at compile time. This, in turn, is -required by the parsing algorithms used for parsing with GF grammars. -

-

-In addition to string literals, tokens can be formed by a specific -non-canonical operator: -

- - -

- -

-

-Being noncanonical, gluing is equipped with a computation rule: -string literals are glued by forming a new string literal, and -empty token lists can be ignored: -

- - -

-Since tokens must be known at compile time, -the operands of gluing may not depend on run-time variables, -as defined here. -

-

-As syntactic sugar, token lists can be given as bracketed string literals, where -spaces separate tokens: -

- - -

-Notice that there are no empty tokens, but the expression [] -can be used in a context requiring a token, in particular in gluing expression -below. Since [] denotes an empty token list, the following computation laws -are valid: -

- - -

-Moreover, concatenation and gluing are associative: -

- - -

-For the programmer, associativity and the empty token laws mean -that the compiler can use them to simplify string expressions. -It also means that these laws are respected in pattern matching -on strings. -

-

-A prime example of prefix-dependent choice operation is the following -approximative expression for the English indefinite article: -

-
-  pre {
-    ("a" | "e" | "i" | "o") => "an" ;
-    _ => "a"
-  } ;
-
-

-This expression can be computed in the context of a subsequent token: -

- - -

-The matching prefix is defined by comparing the string with the prefix of -the token. If the prefix is a variant list of strings, then it matches -the token if any of the strings in the list matches it. -

-

-The computation rule can sometimes be applied at compile time, but it general, -prefix-dependent choices need to be passed to the run-time grammar, because -they are not given a subsequent token to compare with, or because the -subsequent token depends on a run-time variable. -

-

-The prefix-dependent choice expression itself may not depend on run-time -variables. -

-

- There is an older syntax for prefix-dependent choice, - namely: pre { s ; s1 / p1 ; ... ; sn / pn}. This syntax - will not accept strings as patterns. -

-

-In GF prior to 3.0, a specific type Strs -is used for defining prefixes, -instead of just variants of Str. -

- -

Records and record types

-

-A record is a collection of objects of possibly different types, -accessible by projections from the record with labels pointing -to these objects. A record is also itself an object, whose type is -a record type. Record types have the form -

- { r1 : A1 ; ... ; rn : An } -
-where n >= 0, each Ai is a type, and the labels ri are -distinct. A record of this type has the form -
- { r1 = a1 ; ... ; rn = an } -
-where each #aii : "Aii. A limiting case is the empty record type -{}, which has the object {}, the empty record. -

-

-The fields of a record type are its parts of the form r : A, -also called typings. The fields of a record are of the form -r = a, also called value assignments. Value assignments -may optionally indicate the type, as in r : A = a. -

-

-The order of fields in record types and records is insignificant: two record -types (or records) are equal if they have the same fields, in any order, and a -record is an object of a record type, if it has type-correct value assignments -for all fields of the record type. -The latter definition implies the even stronger -principle of record subtyping: a record can have any type that has some -subset of its fields. This principle is explained further -here. -

-

-All fields in a record must have distinct labels. Thus it is not possible -e.g. to "redefine" a field "later" in a record. -

-

-Lexically, labels are identifiers (defined here). -This is with the exception -of the labels selecting bound variables in the linearization of higher-order -abstract syntax, which have the form $i for an integer i, -as specified here. -In source code, these labels should not appear in records fields, -but only in selections. -

-

-Labels occur only in syntactic positions where they cannot be confused with -constants or variables. Therefore it is safe to write, as in Prelude, -

-
-    ss : Str -> {s : Str} = \s -> {s = s} ;
-
-

-A projection is an expression of the form -

- t.r -
-where t must be a record and r must be a label defined in it. -The type of the projection is the type of that field. -The computation rule for projection returns the value assigned to that field: -
-{ ... ; r = a ; ... }.r ==> a -
-Notice that the dot notation t.r is also used for qualified names -as specified here. -This ambiguity follows tradition and convenience. It is -resolved by the following rules (before type checking): -

-
    -
  1. if t is a bound variable or a constant in scope, - t.r is type-checked as a projection -
  2. otherwise, t.r is type-checked as a qualified name -
- -

-As syntactic sugar, types and values can be shared: -

- - -

-Another syntactic sugar are tuple types and tuples, which are translated -by endowing their unlabelled fields by the labels p1, p2,... in the -order of appearance of the fields: -

- - -

-A record extension is formed by adding fields to a record or a record type. -The general syntax involves two expressions, -

- R ** S -
-The result is a record type or a record with a union of the fields of R and -S. It is therefore well-formed if -

- -(Since GF version 3.6) If R and S are record objects, -then the labels in them need not be disjoint. Labels defined in -S are then given priority, so that record extensions in fact -works as record update. A common pattern of using this feature -is -
-  lin F x ... = x ** {r = ... x.r ...}
-
-where x is a record with many fields, just one of which is -updated. Following the normal binding conditions, x.r on the -right hand side still refers to the old value of the r field. - - - -

Subtyping

-

- -

-

-The possibility of having superfluous fields in a record forms the basis of -the subtyping relation. -That A is a subtype of B means that a : A implies a : B. -This is clearly satisfied for records with superfluous fields: -

- - -

-The GF grammar compiler extends subtyping to function types by covariance -and contravariance: -

- - -

-The logic of these rules is natural: if a function is returns a value -in a subtype, then this value is a fortiori in the supertype. -If a function is defined for some type, then it is a fortiori defined -for any subtype. -

-

-In addition to the well-known principles of record subtyping and co- and -contravariance, GF implements subtyping for initial segments of integers: -

- - -

-As the last rule, subtyping is transitive: -

- - - -

Tables and table types

-

- -

-

-One of the most characteristic constructs of GF is tables, also called -finite functions. That these functions are finite means that it -is possible to finitely enumerate all argument-value pairs; this, in -turn, is possible because the argument types are finite. -

-

-A table type has the form -

-P => T -
-where P must be a parameter type in the sense defined here, whereas -T can be any type. -

-

-Canonical expressions of table types are tables, of the form -

-table { V1 => t1 ; ... ; Vn => tn } -
-where V1,...,Vn is the complete list of the parameter values of -the argument type P (defined here), and each ti is -an expression of the value type T. -

-

-In addition to explicit enumerations, -tables can be given by pattern matching, -

-table {p1 => t1 ; ... ; pm => tm} -
-where p1,....,pm is a list of patterns that covers all values of type P. -Each pattern pi may bind some variables, on which the expression ti -may depend. A complete account of patterns and pattern matching is given -here. -

-

-A course-of-values table omits the patterns and just lists all -values. It uses the enumeration of all values of the argument type P -to pair the values with arguments: -

-table P [t1 ; ... ; tn] -
-This format is not recommended for GF source code, since the -ordering of parameter values is not specified and therefore a -compiler-internal decision. -

-

-The argument type can be indicated in ordinary tables as well, which is -sometimes helpful for type inference: -

-table P { ... } -
-

-

-The selection operator !, applied to a table t and to an expression -v of its argument type -

-t ! v -
-returns the first pattern matching result from t with v, as defined -here. The order of patterns is thus significant as long as the -patterns contain variables or wildcards. When the compiler reorders the -patterns following the enumeration of all values of the argument type, -this order no longer matters, because no overlap remains between patterns. -

-

-The GF compiler performs table expansion, i.e. an analogue of -eta expansion defined here, where a table is applied to all -values to its argument type: -

-t : P => T ==> -table P [t ! V1 ; ... ; t ! Vn] -
-As syntactic sugar, one-branch tables can be written in a way similar to -lambda abstractions: -
-\\p => t === table {p => t } -
-where p is either a variable or a wildcard (_). Multiple bindings -can be abbreviated: -
-\\p,q => t === \\p => \\q => t -
-Case expressions are syntactic sugar for selections: -
-case e of {...} === table {...} ! e -
-

- -

Pattern matching

-

- -

-

-We will list all forms of patterns that can be used in table branches. -We define their variable bindings and matching substitutions. -

-

-We start with the patterns available for all parameter types, as well -as for the types Integer and Str. -

- - -

-The following patterns are only available for the type Str: -

- - -

-The following pattern is only available for the types Integer -and Ints n: -

- - -

-All patterns must be linear: the same pattern variable may occur -only once in them. This is what makes it straightforward to speak -about unions of binding sets and substitutions. -

-

-Pattern matching is performed in the order in which the branches -appear in the source code: the branch of the first matching pattern is followed. -In concrete syntax, the type checker reject sets of patterns that are -not exhaustive, and warns for completely overshadowed patterns. -It also checks the type correctness of patterns with respect to the -argument type. In abstract syntax, only type correctness is checked, -no exhaustiveness or overshadowing. -

-

-It follows from the definition of record pattern matching -that it can utilize partial records: the branch -

-
-    {g = Fem} => t
-
-

-in a table of type {g : Gender ; n : Number} => T means the same as -

-
-    {g = Fem ; n = _} => t
-
-

-Variables in regular expression patterns -are always bound to the first match, which is the first -in the sequence of binding lists. For example: -

- - - -

Free variation

-

-An expressions of the form -

-variants {t1 ; ... ; tn} -
-where all ti are of the same type T, has itseld type T. -This expression presents ti,...,tn as being in free variation: -the choice between them is not determined by semantics or parameters. -A limiting case is -
-variants {} -
-which encodes a rule saying that there is no way to express a certain -thing, e.g. that a certain inflectional form does not exist. -

-

-A common wisdom in linguistics is that "there is no free variation", which -refers to the situation where all aspects are taken into account. For -instance, the English negation contraction could be expressed as free variation, -

-
-    variants {"don't" ; "do" ++ "not"}
-
-

-if only semantics is taken into account, but if stylistic aspects are included, -then the proper formulation might be with a parameter distinguishing between -informal and formal style: -

-
-    case style of {Informal => "don't" ; Formal => "do" ++ "not"}
-
-

-Since there is not way to choose a particular element from a ``variants` list, -free variants is normally not adequate in libraries, nor in grammars meant for -natural language generation. In application grammars -meant to parse user input, free variation is a way to avoid cluttering the -abstract syntax with semantically insignificant distinctions and even to -tolerate some grammatical errors. -

-

-Permitting variants in all types involves a major modification of the -semantics of GF expressions. All computation rules have to be lifted to -deal with lists of expressions and values. For instance, -

-t ! variants {t1 ; ... ; tn} ==> -variants {t ! t1 ; ... ; t ! tn} -
-This is done in such a way that -variation does not distribute to records (or other product-like structures). -For instance, variants of records, -

-
-    variants {{s = "Auto" ; g = Neutr} ; {s = "Wagen" ; g = Masc}}
-
-

-is not the same as a record of variants, -

-
-    {s = variants {"Auto" ; "Wagen"} ; g = variants {Neutr ; Masc}}
-
-

-Variants of variants are flattened, -

-variants {...; variants {t1 ;...; tn} ;...} -==> -variants {...; t1 ;...; tn ;...} -
-and singleton variants are eliminated, -
-variants {t} ==> t -
-

- -

Local definitions

-

-A local definition, i.e. a let expression has the form -

-let x : T = t in e -
-The type of x must be T, which also has to be the type of t. -Computation is performed by substituting t for x in e: -
-let x : T = t in e ==> e {x = t} -
-As syntactic sugar, the type can be omitted if the type checker is -able to infer it: -
-let x = t in e -
-It is possible to compress several local definitions into one block: -
-let x : T = t ; y : U = u in e -=== -let x : T = t in let y : U = u in e -
-Another notational variant is a definition block appearing after the main -expression: -
-e where {...} === let {...} in e -
-Curly brackets are obligatory in the where form, and can -also be optionally used in the let form. -

-

-Since a block of definitions is treated as syntactic sugar -for a nested let expression, a constant must be defined before it -is used: the scope is not mutual, as in a module body. -Furthermore, unlike in lin and oper definitions, it is not possible -to bind variables on the left of the equality sign. -

- -

Function applications in concrete syntax

-

- -

-

-Fully compiled concrete syntax may not include expressions of function types -except on the outermost level of lin rules, as defined here. -However, -in the source code, and especially in oper definitions, functions -are the main vehicle of code reuse and abstraction. Thus function types and -functions follow the same rules as in abstract syntax, as specified -here. In -particular, the application of a lambda abstract is computed by beta conversion. -

-

-To ensure the elimination of functions, GF uses a special computation rule -for pushing function applications inside tables, since otherwise run-time -variables could block their applications: -

-(table {p1 => f1 ; ... ; - pn => fn } ! e) a - ==> - table {p1 => f1 a ; ... ; - pn => fn a} ! e -
-Also parameter constructors with non-empty contexts, as defined -here, -result in expressions in application form. These expressions are never -a problem if their arguments are just constructors, because they can then -be translated to integers corresponding to the position of the expression -in the enumaration of the values of its type. -However, a constructor -applied to a run-time variable may need to be converted as follows: -
-C...x... ==> case x of {_ => C...x} -
-The resulting expression, when processed by table expansion as explained -here, -results in C being applied to just values of the type of x, and the -application thereby disappears. -

- -

Reusing top-level grammars as resources

-

- -

-

-This section is valid for GF 3.0, which abandons the "lock field" -discipline of GF 2.8. -

-

-As explained here, -abstract syntax modules can be opened as interfaces -and concrete syntaxes as their instances. This means that judgements are, -as it were, translated in the following way: -

- - -

-Notice that the value T of lincat definitions is not disclosed -in the translation. This means that the type C remains abstract: the -only ways of building an object of type C are the operations f -obtained from fun and lin rules. -

-

-The purpose of keeping linearization types abstract is to enforce -grammar checking via type checking. This means that any well-typed -operation application is also well-typed in the sense of the original -grammar. If the types were disclosed, then we could for instance easily -confuse all categories that have the linearization -type {s : Str}. Yet another reason is that revealing the types -makes it impossible for the library programmers to change their type -definitions afterwards. -

-

-Library writers may occasionally want to have access to the values of -linearization types. The way to make it possible is to add an extra -construction operation to a module in which the linearization type -is available: -

-
-    oper MkC : T -> C = \x -> x
-
-

-In object-oriented terms, the type C itself is protected, whereas -MkC is a public constructor of C. Of course, it is possible to -make these constructors overloaded (concept explained here), -to enable easy access to special cases. -

- -

Predefined concrete syntax types

-

- -

-

-The following concrete syntax types are predefined: -

- - -

-The last two types are, in a way, extended by user-written grammars, -since new parameter types can be defined in the way shown here, -and every paramater type is also a type. From the point of view of the values -of expressions, however, a param declaration does not extend -PType, since all parameter types get compiled to initial -segments of integers. -

-

-Notice the difference between the concrete syntax types -Str and Integer on the one hand, and the abstract -syntax categories String and Int, on the other. -As concrete syntax types, the latter are treated in -the same way as any reused categories: their objects -can be formed by using syntax trees (string and integer -literals). -

-

-The type name Integer replaces in GF 3.0 the name Int, -to avoid confusion with the abstract syntax type and to be analogous -with the Str vs. String distinction. -

- -

Predefined concrete syntax operations

-

-The following predefined operations are defined in the resource module -prelude/Predef.gf. Their implementations are defined as -a part of the GF grammar compiler. -

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
operationtypeexplanation
PBoolPTypePTrue | PFalse
ErrorTypethe empty type
IntTypethe type of integers
IntsInteger -> Typethe type of integers from 0 to n
errorStr -> Errorforms error message
lengthStr -> Intlength of string
dropInteger -> Str -> Strdrop prefix of length
takeInteger -> Str -> Strtake prefix of length
tkInteger -> Str -> Strdrop suffix of length
dpInteger -> Str -> Strtake suffix of length
eqIntInteger -> Integer -> PBooltest if equal integers
lessIntInteger -> Integer -> PBooltest order of integers
plusInteger -> Integer -> Integeradd integers
eqStrStr -> Str -> PBooltest if equal strings
occurStr -> Str -> PBooltest if occurs as substring
occursStr -> Str -> PBooltest if any char occurs
show(P : Type) -> P -> Strconvert param to string
read(P : Type) -> Str -> Pconvert string to param
toStr(L : Type) -> L -> Strfind the "first" string
nonExistStra special token marking
-non-existing morphological forms
BINDStra special token marking
-that the surrounding tokens should not
-be separated by space
SOFT_BINDStra special token marking
-that the surrounding tokens may not
-be separated by space
SOFT_SPACEStra special token marking
-that the space between the surrounding tokens
-is optional
CAPITStra special token marking
-that the first character in the next token
-should be capitalized
ALL_CAPITStra special token marking
-that the next word should be
-in all capital letters
- -

-

-Compilation eliminates these operations, and they may therefore not -take arguments that depend on run-time variables. -

-

-The module Predef is included in the opens list of all -modules, and therefore does not need to be opened explicitly. -

- -

Flags and pragmas

- -

Some flags and their values

-

- -

-

-The flag coding in concrete syntax sets the character encoding -used in the grammar. Internally, GF uses unicode, and .pgf files -are always written in UTF8 encoding. The presence of the flag -coding=utf8 prevents GF from encoding an already encoded -file. -

-

- -

-

-The flag startcat in abstract syntax sets the default start category for -parsing, random generation, and any other grammar operation that depends -on category. Its legal values are the categories defined or inherited in -the abstract syntax. -

-

- - -

- -

Compiler pragmas

-

-Compiler pragmas are a special form of comments prefixed with --#. -Currently GF interprets the following pragmas. -

- - - - - - - - - -
pragmaexplanation
-path=PATHpath list for searching modules
- -

-

-For instance, the line -

-
-    --# -path=.:present:prelude:/home/aarne/GF/tmp
-
-

-in the top of FILE.gf causes the GF compiler, when invoked on FILE.gf, -to search through the current directory (.) and the directories -present, prelude, and /home/aarne/GF/tmp, in this order. -If a directory DIR is not found relative to the working directory, -$(GF_LIB_PATH)/DIR is searched. $GF_LIB_PATH -can be a colon-separated list of directories, in which case each directory -in the list contributes to the search path expansion. - -

- -

Alternative grammar input formats

-

-While the GF language as specified in this document is the most versatile -and powerful way of writing GF grammars, there are several other formats -that a GF compiler may make available for users, either to get started -with small grammars or to semiautomatically convert grammars from other -formats to GF. Here are the ones supported by GF 2.8 and 3.0. -

- -

Old GF without modules

-

- -

-

-Before GF compiler version 2.0, there was no module system, and -all kinds of judgement could be written in all files, without -any headers. This format is still available, and the compiler -(version 2.8) detects automatically if a file is in the current -or the old format. However, the old format is not recommended -because of pure modularity and missing separate compilation, -and also because libraries are not available, since the old -and the new format cannot be mixed. With version 2.8, grammars -in the old format can be converted to modular grammar with the -command -

-
-    > import -o FILE.gf
-
-

-which rewrites the grammar divided into three files: -an abstract, a concrete, and a resource module. -

- -

Context-free grammars

-

-A quick way to write a GF grammar is to use the context-free format, -also known as BNF. Files of this form are recognized by the suffix -.cf. Rules in these files have the form -

-Label . Cat ::= (String | Cat)* ; -
-where Label and Cat are identifiers and String quoted strings. -

-

-There is a shortcut form generating labels automatically, -

-Cat ::= (String | Cat)* ; -
-In the shortcut form, vertical bars (|) can be used to give -several right-hand-sides at a time. An empty right-hand side -means the singleton of an empty sequence, and not an empty union. -

-

-Just like old-style GF files (previous section), contex-free grammar -files can be converted to modular GF by using the -o option to -the compiler in GF 2.8. -

- -

Extended BNF grammars

-

-Extended BNF (FILE.ebnf) -goes one step further from the shortcut notation of previous section. -The rules have the form -

-Cat ::= RHS ; -
-where an RHS can be any regular expression -built from quoted strings and category symbols, in the following ways: -

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
RHS itemexplanation
Catnonterminal
Stringterminal
RHS RHSsequence
RHS | RHSalternatives
RHS ?optional
RHS *repetition
RHS +non-empty repetition|
- -

-

-Parentheses are used to override standard precedences, where -| binds weaker than sequencing, which binds weaker than the unary operations. -

-

-The compiler generates not only labels, but also new categories corresponding -to the regular expression combinations actually in use. -

-

-Just like .cf files (previous section), .ebnf -files can be converted to modular GF by using the -o option to -the compiler in GF 2.8. -

- -

Example-based grammars

-

-Example-based grammars (.gfe) provide a way to use -resource grammar libraries without having to know the names -of functions in them. The compiler works as a preprocessor, -saving the result in a (.gf) file, which can be compiled -as usual. -

-

-If a library is implemented as an abstract and concrete syntax, -it can be used for parsing. Calls of library functions can therefore -be formed by parsing strings in the library. GF has an expression -format for this, -

-in C String -
-where C is the category in which to parse (it can be qualified by -the module name) and the string is the input to parser. Expressions -of this form are replaced by the syntax trees that result. These -trees are always type-correct. If several parses are found, all but -the first one are given in comments. -

-

-Here is an example, from GF/examples/animal/: -

-
-    --# -resource=../../lib/present/LangEng.gfc
-    --# -path=.:present:prelude
-
-    incomplete concrete QuestionsI of Questions = open Lang in {
-      lincat
-        Phrase = Phr ;
-        Entity = N ;
-        Action = V2 ;
-      lin
-        Who  love_V2 man_N           = in Phr "who loves men" ;
-        Whom man_N love_V2           = in Phr "whom does the man love" ;
-        Answer woman_N love_V2 man_N = in Phr "the woman loves men" ;
-    }
-
-

-The resource pragma shows the grammar that is used for parsing -the examples. -

-

-Notice that the variables love_V2, man_N, etc, are -actually constants in the library. In the resulting rules, such as -

-
-    lin Whom = \man_N -> \love_V2 ->
-      PhrUtt NoPConj (UttQS (UseQCl TPres ASimul PPos
-        (QuestSlash whoPl_IP (SlashV2 (DetCN (DetSg (SgQuant
-          DefArt)NoOrd)(UseN man_N)) love_V2)))) NoVoc ;
-
-

-those constants are nonetheless treated as variables, following -the normal binding conventions, as stated here. -

- -

The grammar of GF

-

-The following grammar is actually used in the parser of GF, although we have -omitted -some obsolete rules still included in the parser for backward compatibility -reasons. -

-

-This document was automatically generated by the BNF-Converter. It was generated together with the lexer, the parser, and the abstract syntax module, which guarantees that the document matches with the implementation of the language (provided no hand-hacking has taken place). -

- -

The lexical structure of GF

- -

Identifiers

-

-Identifiers Ident are unquoted strings beginning with a letter, -followed by any combination of letters, digits, and the characters _ ' -reserved words excluded. -

- -

Literals

-

-Integer literals Integer are nonempty sequences of digits. -

-

-String literals String have the form -"x"}, where x is any sequence of any characters -except " unless preceded by \. -

-

-Double-precision float literals Double have the structure -indicated by the regular expression digit+ '.' digit+ ('e' ('-')? digit+)? i.e.\ -two sequences of digits separated by a decimal point, optionally -followed by an unsigned or negative exponent. -

- -

Reserved words and symbols

-

-The set of reserved words is the set of terminals appearing in the grammar. Those reserved words that consist of non-letter characters are called symbols, and they are treated in a different way from those that are similar to identifiers. The lexer follows rules familiar from languages like Haskell, C, and Java, including longest match and spacing conventions. -

-

-The reserved words used in GF are the following: -

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PTypeStrStrsType
abstractcasecatconcrete
datadefflagsfun
inincompleteinstanceinterface
letlinlincatlindef
linrefofopenoper
parampreprintnameresource
strstabletransfervariants
wherewith
- -

-

-The symbols used in GF are the following: -

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
;=:->
{}**,
()[]
-.|?
<>@!
*+++\
=>_$/
- -

- -

Comments

-

-Single-line comments begin with --.Multiple-line comments are enclosed with {- and -}. -

- -

The syntactic structure of GF

-

-Non-terminals are enclosed between < and >. -The symbols -> (production), | (union) -and eps (empty rule) belong to the BNF notation. -All other symbols are terminals. -

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Grammar->[ModDef]
[ModDef]->eps
|ModDef [ModDef]
ModDef->ModDef ;
|ComplMod ModType = ModBody
ModType->abstract Ident
|resource Ident
|interface Ident
|concrete Ident of Ident
|instance Ident of Ident
|transfer Ident : Open -> Open
ModBody->Extend Opens { [TopDef] }
|[Included]
|Included with [Open]
|Included with [Open] ** Opens { [TopDef] }
|[Included] ** Included with [Open]
|[Included] ** Included with [Open] ** Opens { [TopDef] }
[TopDef]->eps
|TopDef [TopDef]
Extend->[Included] **
|eps
[Open]->eps
|Open
|Open , [Open]
Opens->eps
|open [Open] in
Open->Ident
|( QualOpen Ident )
|( QualOpen Ident = Ident )
ComplMod->eps
|incomplete
QualOpen->eps
[Included]->eps
|Included
|Included , [Included]
Included->Ident
|Ident [ [Ident] ]
|Ident - [ [Ident] ]
Def->[Name] : Exp
|[Name] = Exp
|Name [Patt] = Exp
|[Name] : Exp = Exp
TopDef->cat [CatDef]
|fun [FunDef]
|data [FunDef]
|def [Def]
|data [DataDef]
|param [ParDef]
|oper [Def]
|lincat [PrintDef]
|lindef [Def]
|linref [Def]
|lin [Def]
|printname cat [PrintDef]
|printname fun [PrintDef]
|flags [FlagDef]
CatDef->Ident [DDecl]
|[ Ident [DDecl] ]
|[ Ident [DDecl] ] { Integer }
FunDef->[Ident] : Exp
DataDef->Ident = [DataConstr]
DataConstr->Ident
|Ident . Ident
[DataConstr]->eps
|DataConstr
|DataConstr | [DataConstr]
ParDef->Ident = [ParConstr]
|Ident = ( in Ident )
|Ident
ParConstr->Ident [DDecl]
PrintDef->[Name] = Exp
FlagDef->Ident = Ident
[Def]->Def ;
|Def ; [Def]
[CatDef]->CatDef ;
|CatDef ; [CatDef]
[FunDef]->FunDef ;
|FunDef ; [FunDef]
[DataDef]->DataDef ;
|DataDef ; [DataDef]
[ParDef]->ParDef ;
|ParDef ; [ParDef]
[PrintDef]->PrintDef ;
|PrintDef ; [PrintDef]
[FlagDef]->FlagDef ;
|FlagDef ; [FlagDef]
[ParConstr]->eps
|ParConstr
|ParConstr | [ParConstr]
[Ident]->Ident
|Ident , [Ident]
Name->Ident
|[ Ident ]
[Name]->Name
|Name , [Name]
LocDef->[Ident] : Exp
|[Ident] = Exp
|[Ident] : Exp = Exp
[LocDef]->eps
|LocDef
|LocDef ; [LocDef]
Exp6->Ident
|Sort
|String
|Integer
|Double
|?
|[ ]
|data
|[ Ident Exps ]
|[ String ]
|{ [LocDef] }
|< [TupleComp] >
|< Exp : Exp >
|( Exp )
Exp5->Exp5 . Label
|Exp6
Exp4->Exp4 Exp5
|table { [Case] }
|table Exp6 { [Case] }
|table Exp6 [ [Exp] ]
|case Exp of { [Case] }
|variants { [Exp] }
|pre { Exp ; [Altern] }
|strs { [Exp] }
|Ident @ Exp6
|Exp5
Exp3->Exp3 ! Exp4
|Exp3 * Exp4
|Exp3 ** Exp4
|Exp4
Exp1->Exp2 + Exp1
|Exp2
Exp->Exp1 ++ Exp
|\ [Bind] -> Exp
|\ \ [Bind] => Exp
|Decl -> Exp
|Exp3 => Exp
|let { [LocDef] } in Exp
|let [LocDef] in Exp
|Exp3 where { [LocDef] }
|in Exp5 String
|Exp1
Exp2->Exp3
[Exp]->eps
|Exp
|Exp ; [Exp]
Exps->eps
|Exp6 Exps
Patt2->_
|Ident
|Ident . Ident
|Integer
|Double
|String
|{ [PattAss] }
|< [PattTupleComp] >
|( Patt )
Patt1->Ident [Patt]
|Ident . Ident [Patt]
|Patt2 *
|Ident @ Patt2
|- Patt2
|Patt2
Patt->Patt | Patt1
|Patt + Patt1
|Patt1
PattAss->[Ident] = Patt
Label->Ident
|$ Integer
Sort->Type
|PType
|Str
|Strs
[PattAss]->eps
|PattAss
|PattAss ; [PattAss]
[Patt]->Patt2
|Patt2 [Patt]
Bind->Ident
|_
[Bind]->eps
|Bind
|Bind , [Bind]
Decl->( [Bind] : Exp )
|Exp4
TupleComp->Exp
PattTupleComp->Patt
[TupleComp]->eps
|TupleComp
|TupleComp , [TupleComp]
[PattTupleComp]->eps
|PattTupleComp
|PattTupleComp , [PattTupleComp]
Case->Patt => Exp
[Case]->Case
|Case ; [Case]
Altern->Exp / Exp
[Altern]->eps
|Altern
|Altern ; [Altern]
DDecl->( [Bind] : Exp )
|Exp6
[DDecl]->eps
|DDecl [DDecl]
- -
- - diff --git a/doc/gf-refman.md b/doc/gf-refman.md new file mode 100644 index 000000000..0462cf185 --- /dev/null +++ b/doc/gf-refman.md @@ -0,0 +1,2770 @@ +--- +title: GF Language Reference Manual +author: Aarne Ranta, Krasimir Angelov +date: June 2014, GF 3.6 +toc: true +--- + +This document is a reference manual to the GF programming language. GF, +Grammatical Framework, is a special-purpose programming language, +designed to support definitions of grammars. + +This document is not an introduction to GF; such introduction can be +found in the GF tutorial available on line on the GF web page, + +[`grammaticalframework.org`](http://grammaticalframework.org) + +This manual covers only the language, not the GF compiler or interactive +system. We will however make some references to different compiler +versions, if they involve changes of behaviour having to do with the +language specification. + +This manual is meant to be fully compatible with GF version 3.0. Main +discrepancies with version 2.8 are indicated, as well as with the +reference article on GF, + +A. Ranta, \"Grammatical Framework. A Type Theoretical Grammar +Formalism\", *The Journal of Functional Programming* 14(2), 2004, pp. +145-189. + +This article will referred to as \"the JFP article\". + +As metalinguistic notation, we will use the symbols + +- *a* === *b* to say that *a* is syntactic sugar for *b* +- *a* ==\> *b* to say that *a* is computed (or compiled) to *b* + + +Overview of GF +-------------- + +GF is a typed functional language, borrowing many of its constructs from +ML and Haskell: algebraic datatypes, higher-order functions, pattern +matching. The module system bears resemblance to ML (functors) but also +to object-oriented languages (inheritance). The type theory used in the +abstract syntax part of GF is inherited from logical frameworks, in +particular ALF (\"Another Logical Framework\"; in a sense, GF is Yet +Another ALF). From ALF comes also the use of dependent types, including +the use of explicit type variables instead of Hindley-Milner +polymorphism. + +The look and feel of GF is close to Java and C, due to the use of curly +brackets and semicolons in structuring the code; the expression syntax, +however, follows Haskell in using juxtaposition for function application +and parentheses only for grouping. + +To understand the constructs of GF, and especially their limitations in +comparison to general-purpose programming languages, it is essential to +keep in mind that GF is a special-purpose and non-turing-complete +language. Every GF program is ultimately compiled to a **multilingual +grammar**, which consists of an **abstract syntax** and a set of +**concrete syntaxes**. The abstract syntax defines a system of **syntax +trees**, and each concrete syntax defines a mapping from those syntax +trees to **nested tuples** of strings and integers. This mapping is +**compositional**, i.e. **homomorphic**, and moreover **reversible**: +given a nested tuple, there exists an effective way of finding the set +of syntax trees that map to this tuple. The procedure of applying the +mapping to a tree to produce a tuple is called **linearization**, and +the reverse search procedure is called **parsing**. It is ultimately the +requirement of reversibility that restricts GF to be less than +turing-complete. This is reflected in restrictions to recursion in +concrete syntax. Tree formation in abstract syntax, however, is fully +recursive. + +Even though run-time GF grammars manipulate just nested tuples, at +compile time these are represented by by the more fine-grained labelled +records and finite functions over algebraic datatypes. This enables the +programmer to write on a higher abstraction level, and also adds type +distinctions and hence raises the level of checking of programs. + + +The module system +----------------- + + +### Top-level and supplementary module structure + +The big picture of GF as a programming language for multilingual +grammars explains its principal module structure. Any GF grammar must +have an abstract syntax module; it can in addition have any number of +concrete syntax modules matching that abstract syntax. Before going to +details, we give a simple example: a module defining the **category** +`A` of adjectives and one adjective-forming **function**, the zero-place +function `Even`. We give the module the name `Adj`. The GF code for the +module looks as follows: + + abstract Adj = { + cat A ; + fun Even : A ; + } + +Here are two concrete syntax modules, one intended for mapping the trees +to English, the other to Swedish. The mappling is defined by `lincat` +definitions assigning a **linearization type** to each category, and +`lin` definitions assigning a **linearization** to each function. + + concrete AdjEng of Adj = { + lincat A = {s : Str} ; + lin Even = {s = "even"} ; + } + + concrete AdjSwe of Adj = { + lincat A = {s : AForm => Str} ; + lin Even = {s = table { + ASg Utr => "jämn" ; + ASg Neutr => "jämnt" ; + APl => "jämna" + } + } ; + param AForm = ASg Gender | APl ; + param Gender = Utr | Neutr ; + } + +These examples illustrate the main ideas of multilingual grammars: + +- the concrete syntax must match the abstract syntax: + - every `cat` is given a `lincat` + - every `fun` is given a `lin` + + + +- the concrete syntax is internally coherent: + - the `lin` rules respect the types defined by `lincat` rules + + + +- concrete syntaxes are independent of each other + - they can use different `lincat` and `lin` definitions + - they can define their own **parameter types** (`param`) + +The first two ideas form the core of the **static checking** of GF +grammars, eliminating the possibility of run-time errors in +linearization and parsing. The third idea gives GF the expressive power +needed to map abstract syntax to vastly different languages. + +Abstract and concrete modules are called **top-level grammar modules**, +since they are the ones that remain in grammar systems at run time. +However, in order to support **modular grammar engineering**, GF +provides much more module structure than strictly required in top-level +grammars. + +**Inheritance**, also known as **extension**, means that a module can +inherit the contents of one or more other modules to which new +judgements are added, e.g. + + abstract MoreAdj = Adj ** { + fun Odd : A ; + } + +**Resource modules** define parameter types and **operations** usable in +several concrete syntaxes, + + resource MorphoFre = { + param Number = Sg | Pl ; + param Gender = Masc | Fem ; + oper regA : Str -> {s : Gender => Number => Str} = + \fin -> { + s = table { + Masc => table {Sg => fin ; Pl => fin + "s"} ; + Fem => table {Sg => fin + "e" ; Pl => fin + "es"} + } + } ; + } + +By **opening**, a module can use the contents of a resource module +without inheriting them, e.g. + + concrete AdjFre of Adj = open MorphoFre in { + lincat A = {s : Gender => Number => Str} ; + lin Even = regA "pair" ; + } + +**Interfaces** and **instances** separate the contents of a resource +module to type signatures and definitions, in a way analogous to +abstract vs. concrete modules, e.g. + + interface Lexicon = { + oper Adjective : Type ; + oper even_A : Adjective ; + } + + instance LexiconEng of Lexicon = { + oper Adjective = {s : Str} ; + oper even_A = {s = "even"} ; + } + +**Functors** i.e. **parametrized modules** i.e. **incomplete modules**, +defining a concrete syntax in terms of an interface. + + incomplete concrete AdjI of Adj = open Lexicon in { + lincat A = Adjective ; + lin Even = even_A ; + } + +A functor can be **instantiated** by providing instances of its open +interfaces. + + concrete AdjEng of Adj = AdjI with (Lexicon = LexiconEng) ; + + +### Compilation units + +The compilation unit of GF source code is a file that contains a module. +Judgements outside modules are supported only for backward +compatibility, as explained [here](#oldgf). Every source file, suffixed +`.gf`, is compiled to a \"GF object file\", suffixed `.gfo` (as of GF +Version 3.0 and later). For runtime grammar objects used for parsing and +linearization, a set of `.gfo` files is linked to a single file suffixed +`.pgf`. While `.gf` and `.gfo` files may contain modules of any kinds, a +`.pgf` file always contains a multilingual grammar with one abstract and +a set of concrete syntaxes. + +The following diagram summarizes the files involved in the compilation +process. + +`module1.gf module2.gf ... modulen.gf` + +==\> + +`module1.gfo module2.gfo ... modulen.gfo` + +==\> + +grammar.pgf + +Both `.gf` and `.gfo` files are written in the GF source language; +`.pgf` files are written in a lower-level format. The process of +translating `.gf` to `.gfo` consists of **name resolution**, **type +annotation**, **partial evaluation**, and **optimization**. There is a +great advantage in the possibility to do this separately for GF modules +and saving the result in `.gfo` files. The partial evaluation phase, in +particular, is time and memory consuming, and GF libraries are therefore +distributed in `.gfo` to make their use less arduous. + +*In GF before version 3.0, the object files are in a format called +`.gfc`,* *and the multilingual runtime grammar is in a format called +`.gfcm`.* + +The standard compiler has a built-in **make facility**, which finds out +what other modules are needed when compiling an explicitly given module. +This facility builds a dependency graph and decides which of the +involved modules need recompilation (from `.gf` to `.gfo`), and for +which the GF object can be used directly. + + +### Names + +Each module *M* defines a set of **names**, which are visible in *M* +itself, in all modules extending *M* (unless excluded, as explained +[here](#restrictedinheritance)), and all modules opening *M*. These +names can stand for abstract syntax categories and functions, parameter +types and parameter constructors, and operations. All these names live +in the same **name space**, which means that a name entering a module +more than once due to inheritance or opening can lead to a **conflict**. +It is specified [here](#renaming) how these conflicts are resolved. + +The names of modules live in a name space separate from the other names. +Even here, all names must be distinct in a set of files compiled to a +multilingual grammar. In particular, even files residing in different +directories must have different names, since GF has no notion of +hierarchic module names. + +Lexically, names belong to the class of **identifiers**. An idenfifier +is a letter followed by any number of letters, digits, undercores (`_`) +and primes (`'`). Upper- and lower-case letters are treated as distinct. +Nothing dictates the choice of upper or lower-case initials, but the +standard libraries follow conventions similar to Haskell: + +- upper case is used for modules, abstract syntax categories and + functions, parameter types and constructors, and type synonyms +- lower case is used for non-type-valued operations and for variables + +[]{#identifiers} + +\"Letters\" as mentioned in the identifier syntax include all 7-bit +ASCII letters. Iso-latin-1 and Unicode letters are supported in varying +degrees by different tools and platforms, and are hence not recommended +in identifiers. + + +### The structure of a module + +Modules of all types have the following structure: + +*moduletype* *name* `=` *extends* *opens* *body* + +The part of the module preceding the body is its **header**. The header +defines the type of the module and tells what other modules it inherits +and opens. The body consists of the judgements that introduce all the +new names defined by the module. + +Any of the parts *extends*, *opens*, and *body* may be empty. If they +are all filled, delimiters and keywords separate the parts in the +following way: + +*moduletype* *name* `=` *extends* `**` `open` *opens* `in` `{` *body* +`}` + +The part *moduletype* *name* looks slightly different if the type is +`concrete` or `instance`: the *name* intrudes between the type keyword +and the name of the module being implemented and which really belongs to +the type of the module: + +`concrete` *name* `of` *abstractname* + +The only exception to the schema of functor syntax is functor +instantiations: the instantiation list is given in a special way between +*extends* and *opens*: + +`incomplete concrete` *name* `of` *abstractname* `=` *extends* `**` +*functorname* `with` *instantiations* `**` `open` *opens* `in` `{` +*body* `}` + +Logically, the part \"*functorname* `with` *instantiations*\" should +really be one of the *extends*. This is also shown by the fact that it +can have restricted inheritance (concept defined +[here](#restrictedinheritance)). + + +### Module types, headers, and bodies + +The *extends* and *opens* parts of a module header are lists of module +names (with possible qualifications, as defined below +[here](#qualifiednames)). The first step of type checking a module +consists of verifying that these names stand for modules of approptiate +module types. As a rule of thumb, + +- the *extends* of a module must have the same *moduletype* +- the *opens* of a module must be of type `resource` + +However, the precise rules are a little more fine-grained, because of +the presence of interfaces and their instances, and the possibility to +reuse abstract and concrete modules as resources. The following table +gives, for all module types, the possible module types of their +*extends* and *opens*, as well as the forms of judgement legal in that +module type. + + module type extends opens body + --------------------------- ------------ ------------ ---------------------------- + `abstract` abstract \- `cat, fun, def, data` + `concrete of` *abstract* concrete resource\* `lincat, cat, oper, param` + `resource` resource\* resource\* `oper, param` + `interface` resource+ resource\* `oper, param` + `instance of` *interface* resource\* resource\* `oper, param` + `incomplete` concrete concrete+ resource+ `lincat, cat, oper, param` + +The table uses the following shorthands for lists of module types: + +- resource\*: resource, instance, concrete +- resource+: resource\*, interface, abstract +- concrete+: concrete, incomplete concrete + +The legality of judgements in the body is checked before the judgements +themselves are checked. + +The forms of judgement are explained [here](#judgementforms). + + +### Digression: the logic of module types + +Why are the legality conditions of opens and extends so complicated? The +best way to grasp them is probably to consider a simplified logical +model of the module system, replacing modules by types and functions. +This model could actually be developed towards treating modules in GF as +first-class objects; so far, however, this step has not been motivated +by any practical needs. + + module object and type + ------------------------------------------ ----------------------- + abstract A = B A = B : type + concrete C of A = B C = B : A -\> S + interface I = B I = B : type + instance J of I = B J = B : I + incomplete concrete C of A = open I in B C = B : I -\> A -\> S + concrete K of A = C with (I=J) K = B(J) : A -\> S + resource R = B R = B : I + concrete C of A = open R in B C = B(R) : A -\> S + +A further step of defining modules as first-class objects would use +GADTs and record types: + +- an abstract syntax is a Generalized Algebraic Datatype (GADT) +- the target type `S` of concrete syntax is the type of nested tuples + over strings and integers +- an interface is a labelled record type +- an instance is a record of the type defined by the interface +- a functor, with a module body opening an interface, is a function on + its instances +- the instantiation of a functor is an application of the function to + some instance +- a resource is a typed labelled record, putting together an interface + and an instance of it +- the body of a module opening a resource is as a function on the + interface implicit in the resource; this function is immediately + applied to the instance defined in the resource + +Slightly unexpectedly, interfaces and instances are easier to understand +in this way than resources - a resource is, indeed, more complex, since +it fuses together an interface and an instance. + +[]{#openabstract} + +When an abstract is used as an interface and a concrete as its instance, +they are actually reinterpreted so that they match the model. Then the +abstract is no longer a GADT, but a system of *abstract* datatypes, with +a record field of type `Type` for each category, and a function among +these types for each abstract syntax function. A concrete syntax +instantiates this record with linearization types and linearizations. + + +### Inheritance + +After checking that the *extends* of a module are of appropriate module +types, the compiler adds the inherited judgements to the judgements +included in the body. The inherited judgements are not copied entirely, +but their names with links to the inherited module. Conflicts may arise +in this process: a name can have two definitions in the combined pool of +inherited and added judgements. Such a conflict is always an error: GF +provides no way to redefine an inherited constant. + +Simple as the definition of a conflict may sound, it has to take care of +the inheritance hierarchy. A very common pattern of inheritance is the +**diamond**: inheritance from two modules which themselves inherit a +common base module. Assume that the base module defines a name `f`: + + N + / \ + M1 M2 + \ / + Base {f} + +Now, `N` inherits `f` from both `M1` and `M2`, so is there a conflict? +The answer in GF is *no*, because the \"two\" `f`\'s are in the end the +same: the one defined in `Base`. The situation is thus simpler than in +**multiple inheritance** in languages like C++, because definitions in +GF are **immutable**: neither `M1` nor `M2` can possibly have changed +the definition of `f` given in `Base`. In practice, the compiler manages +inheritance through hierarchy in a very simple way, by just always +creating a link not to the immediate parent, but the original ancestor; +this ancestor can be read from the link provided by the immediate +parent. Here is how links are created from source modules by the +compiler: + + Base {f} + M1 {m1} ===> M1 {Base.f, m1} + M2 {m2} ===> M2 {Base.f, m2} + N {n} ===> N {Base.f, M1.m1, M2.m2, n} + +[]{#restrictedinheritance} + +Inheritance can be **restricted**. This means that a module can be +specified as inheriting *only* explicitly listed constants, or all +constants *except* ones explicitly listed. The syntax uses constant +names in brackets, prefixed by a minus sign in the case of an exclusion +list. In the following configuration, N inherits `a,b,c` from `M1`, and +all names but `d` from `M2` + + N = M1 {a,b,c}, M2-{d} + +Restrictions are performed as a part of inheritance linking, module by +module: the link is created for a constant if and only if it is both +included in the module and compatible with the restriction. Thus, for +instance, an inadvertent usage can exclude a constant from one module +but inherit it from another one. In the following configuration, `f` is +inherited via `M1`, if `M1` inherits it. + + N = M1 [a,b,c], M2-[f] + +Unintended inheritance may cause problems later in compilation, in the +judgement-level dependency analysis phase. For instance, suppose a +function `f` has category `C` as its type in `M`, and we only include +`f`. The exclusion has the effect of creating an ill-formed module: + + abstract M = {cat C ; fun f : C ;} + M [f] ===> {fun f : C ;} + +One might expect inheritance restriction to be transitive: if an +included constant *b* depends on some other constant *a*, then *a* +should be included automatically. However, this rule would leave to +hard-to-detect inheritances. And it could only be applied later in the +compilation phase, when the compiler has not only collected the names +defined, but also resolved the names used in definitions. + +Yet another pitfall with restricted inheritance is that it must be +stated for each module separately. For instance, a concrete syntax of an +abstract must exclude all those names that the abstract does, and a +functor instantiation must replicate all restrictions of the functor. + + +### Opening + +Opening makes constants from other modules usable in judgements, without +inheriting them. This means that, unlike inheritance, opening is not +transitive. + +[]{#qualifiednames} + +Opening cannot be restricted as inheritance can, but it can be +**qualified**. This means that the names from the opened modules cannot +be used as such, but only as prefixed by a qualifier and a dot (`.`). +The qualifier can be any identifier, including the name of the module. +Here is an example of an *opens* list: + + open A, (X = XSLTS), (Y = XSLTS), B + +If `A` defines the constant `a`, it can be accessed by the names + + a A.a + +If `XSLTS` defines the constant `x`, it can be accessed by the names + + X.x Y.x XSLTS.x + +Thus qualification by real module name is always possible, and one and +the same module can be qualified in different ways at the same time (the +latter can be useful if you want to be able to change the +implementations of some constants to a different resource later). Since +the qualification with real module name is always possible, it is not +possible to \"swap\" the names of modules locally: + + open (A=B), (B=A) -- NOT POSSIBLE! + +The list of qualifiers names and module names in a module header may +thus not contain any duplicates. + + +### Name resolution + +[]{#renaming} + +**Name resolution** is the compiler phase taking place after inheritance +linking. It qualifies all names occurring in the definition parts of +judgements (that is, just excluding the defined names themselves) with +the names of the modules they come from. If a name can come from +different modules (that is, not from their common ancestor), a conflict +is reported; this decision is hence not dependent on e.g. types, which +are known only at a later phase. + +Qualification of names is the main device for avoiding conflicts in name +resolution. No other information is used, such as priorities between +modules. However, if a name is defined in different opened modules but +never used in the module body, a conflict does not arise: conflicts +arise only when names are used. Also in this respect, opening is thus +different from inheritance, where conflicts are checked independently of +use. + +As usual, inner scope has priority in name resolution. This means that +if an identifier is in scope as a bound variable, it will not be +interpreted as a constant, unless qualified by a module name (variable +bindings are explained [here](#variablebinding)). + + +### Functor instantiations + +We have dealt with the principles of module headers, inheritance, and +names in a general way that applies to all module types. The exception +is functor instantiations, that have an extra part of the instantiating +equations, assigning an instance to every interface. Here is a typical +example, displaying the full generality: + + concrete FoodsEng of Foods = PhrasesEng ** + FoodsI-[Pizza] with + (Syntax = SyntaxEng), + (LexFoods = LexFoodsEng) ** + open SyntaxEng, ParadigmsEng in { + lin Pizza = mkCN (mkA "Italian") (mkN "pie") ; + } + +(The example is modified from Section 5.9 in the GF Tutorial.) + +The instantiation syntax is similar to qualified *opens*. The +left-hand-side names must be interfaces, the right-hand-side names their +instances. (Recall that `abstract` can be use as `interface` and +`concrete` as its `instance`.) Inheritance from the functor can be +restricted, typically in the purpose of defining some excluded functions +in language-specific ways in the module body. + + +### Completeness + +(This section refers to the forms of judgement introduced +[here](#judgementforms).) + +A `concrete` is complete with respect to an `abstract`, if it contains a +`lincat` definition for every `cat` declaration, and a `lin` definition +for every `fun` declaration. + +The same completeness criterion applies to functor instantiations. It is +not possible to use a partial functor instantiation, leading to another +functor. + +Functors do not need to be complete in the sense concrete modules need. +The missing definitions can then be provided in the body of each functor +instantiation. + +A `resource` is complete, if all its `oper` and `param` judgements have +a definition part. While a `resource` must be complete, an `interface` +need not. For an `interface`, it is the definition parts of judgements +are optional. + +An `instance` is complete with respect to an `interface`, if it gives +the definition parts of all `oper` and `param` judgements that are +omitted in the `interface`. Giving definitions to judgements that have +already been defined in the `interface` is illegal. Type signatures, on +the other hand, can be repeated if the same types are used. + +In addition to completing the definitions in an `interface`, its +instance may contain other judgements, but these must all be complete +with definitions. + +Here is an example of an instance and its interface showing the above +variations: + + interface Pos = { + param Case ; -- no definition + param Number = Sg | Pl ; -- definition given + oper Noun : Type = { -- relative definition given + s : Number => Case => Str + } ; + oper regNoun : Str -> Noun ; -- no definition + } + + instance PosEng of Pos = { + param Case = Nom | Gen ; -- definition of Case + -- Number and Noun inherited + oper regNoun = \dog -> { -- type of regNoun inherited + s = table { -- definition of regNoun + Sg => table { + Nom => dog + -- etc + } + } ; + oper house_N : Noun = -- new definition + regNoun "house" ; + } + + +Judgements +---------- + + +### Overview of the forms of judgement + +[]{#judgementforms} + +A module body in GF is a set of **judgements**. Judgements are +definitions or declarations, sometimes combinations of the two; the +common feature is that every judgement introduces a name, which is +available in the module and whenever the module is extended or opened. + +There are several different **forms of judgement**, identified by +different **judgement keywords**. Here is a list of all these forms, +together with syntax descriptions and the types of modules in which each +form can occur. The table moreover indicates whether the judgement has a +default value, and whether it contributes to the **name base**, i.e. +introduces a new name to the scope. + + judgement where module default base + ----------------------------- ---------------------------- ------------ --------- ------ + `cat` C G G context abstract N/A yes + `fun` f : A A type abstract N/A yes + `def` f ps = t f fun, ps patterns, t term abstract yes no + `data` C = f `|` \... `|` g C cat, f\...g fun abstract yes no + `lincat` C = T C cat, T type concrete\* yes yes + `lin` f = t f fun, t term concrete\* no yes + `lindef` C = t C cat, t term concrete\* yes no + `linref` C = t C cat, t term concrete\* yes no + `printname cat` C = t C cat, t term concrete\* yes no + `printname fun` f = t f fun, t term concrete\* yes no + `param` P = C`|` \... `|` D C\...D constructors resource\* N/A yes + `oper` f : T = t T type, t term resource\* N/A yes + `flags` o = v o flag, v value all yes N/A + +Judgements that have default values are rarely used, except `lincat` and +`flags`, which often need values different from the defaults. + +Introducing a name twice in the same module is an error. In other words, +all judgements that have a \"yes\" in the name base column, must have +distinct identifiers on their left-hand sides. + +All judgement end with semicolons (`;`). + +In addition to the syntax given in the table, many of the forms have +syntactic sugar. This sugar will be explained below in connection to +each form. There are moreover two kinds of syntactic sugar common to all +forms: + +- the judgement keyword is shared between consecutive judgements until + a new keyword appears: + `keyw J ; K ;` === `keyw J ; keyw K ;` +- the right-hand sides of colon (`:`) and equality (`=`) can be + shared, by using comma (`,`) as separator of left-hand sides, which + must consist of identifiers + `c,d : T` === `c : T ; d : T ;` + `c,d = t` === `c = t ; d = t ;` + +These conventions, like all syntactic sugar, are performed at an early +compilation phase, directly after parsing. This means that e.g. + + lin f,g = \x -> x ; + +can be correct even though `f` and `g` required different function +types. + +Within a module, judgements can occur in any order. In particular, a +name can be used before it is introduced. + +The explanations of judgement forms refer to the notions of **type** and +**term** (the latter also called **expression**). These notions will be +explained in detail [here](#expressions). + + +### Category declarations, cat + +[]{#catjudgements} + +Category declarations + +`cat` *C* *G* + +define the **basic types** of abstract syntax. A basic type is formed +from a category by giving values to all variables in the **context** +*G*. If the context is empty, the basic type looks the same as the +category itself. Otherwise, application syntax is used: + +*C* *a*~1~\...*a*~n~ + + +### Hypotheses and contexts + +[]{#contexts} + +A context is a sequence of **hypotheses**, i.e. variable-type pairs. A +hypothesis is written + +`(` *x* `:` *T* `)` + +and a sequence does not have any separator symbols. As syntactic sugar, + +- variables can share a type, + `(` *x,y* `:` *T* `)` === `(` *x* `:` *T* `)` `(` *y* `:` *T* `)` +- a **wildcard** can be used for a variable not occurring in types + later in the context, + `(` `_` `:` *T* `)` === `(` *x* `:` *T* `)` +- if the variable does not occur later, it can be omitted altogether, + and parentheses are not used, + *T* === `(` *x* `:` *T* `)` + But if *T* is more complex than an identifier, it needs parentheses + to be separated from the rest of the context. + +An abstract syntax has **dependent types**, if any of its categories has +a non-empty context. + + +### Function declarations, fun + +Function declarations, + +`fun` *f* `:` *T* + +define the **syntactic constructors** of abstract syntax. The type *T* +of *f* is built built from basic types (formed from categories) by using +the function type constructor `->`. Thus its form is + +(*x*~1~ `:` *A*~1~) `->` \... `->` (*x*~n~ `:` *A*~n~) `->` *B* + +where *Ai* are types, called the **argument types**, and *B* is a basic +type, called the **value type** of *f*. The **value category** of *f* is +the category that forms the type *B*. + +A **syntax tree** is formed from *f* by applying it to a full list of +arguments, so that the result is of a basic type. + +A **higher-order function** is one that has a function type as an +argument. The concrete syntax of GF does not support displaying the +bound variables of functions of higher than second order, but they are +legal in abstract syntax. + +An abstract syntax is **context-free**, if it has neither dependent +types nor higher-order functions. Grammars with context-free abstract +syntax are an important subclass of GF, with more limited complexity +than full GF. Whether the *concrete* syntax is context-free in the sense +of the Chomsky hierarchy is independent of the context-freeness of the +abstract syntax. + + +### Function definitions, def + +Function definitions, + +`def` *f* *p*~1~ \... *p*~n~ `=` *t* + +where *f* is a `fun` function and *p*~i~\# are patterns, impose a +relation of **definitional equality** on abstract syntax trees. They +form the basis of **computation**, which is used when comparing whether +two types are equal; this notion is relevant only if the types are +dependent. Computation can also be used for the **normalization** of +syntax trees, which applies even in context-free abstract syntax. + +The set of `def` definitions for *f* can be scattered around the module +in which *f* is introduced as a function. The compiler builds the set of +pattern equations in the order in which the equations appear; this order +is significant in the case of overlapping patterns. All equations must +appear in the same module in which *f* itself declared. + +The syntax of patterns will be specified [here](#patternmatching), +commonly for abstract and concrete syntax. In abstract syntax, +**constructor patterns** are those of the form + +*C* *p*~1~ \... *p*~n~ + +where *C* is declared as `data` for some abstract syntax category (see +next section). A **variable pattern** is either an identifier or a +wildcard. + +A common pitfall is to forget to declare a constructor as data, which +causes it to be interpreted as a variable pattern in definitions. + +Computation is performed by applying definitions and beta conversions, +and in general by using **pattern matching**. Computation and pattern +matching are explained commonly for abstract and concrete syntax +[here](#patternmatching). + +In contrast to concrete syntax, abstract syntax computation is +completely **symbolic**: it does not produce a value, but just another +term. Hence it is not an error to have incomplete systems of pattern +equations for a function. In addition, the definitions can be +**recursive**, which means that computation can fail to terminate; this +can never happen in concrete syntax. + + +### Data constructor definitions, data + +A data constructor definition, + +`data` *C* `=` *f*~1~ `|` \... `|` *f*~n~ + +defines the functions *f1*\...*fn* to be **constructors** of the +category *C*. This means that they are recognized as constructor +patterns when used in function definitions. + +In order for the data constructor definition to be correct, +*f*~1~\...*f*~n~ must be functions with *C* as their value category. + +The complete set of constructors for a category *C* is the union of all +its data constructor definitions. Thus a category can be \"extended\" by +new constructors afterwards. However, all these constructor definitions +must appear in the same module in which the category is itself defined. + +There is syntactic sugar for declaring a function as a constructor at +the same time as introducing it: + +`data` *f* : *A*~1~ `->` \... `->` *A*~n~ `->` *C* *t*~1~ \... *t*~m~ + +=== + +`fun` *f* : *A*~1~ `->` \... `->` *A*~n~ `->` *C* *t*~1~ \... *t*~m~ ; +`data` *C* = *f* + + +### The semantic status of an abstract syntax function + +There are three possible statuses for a function declared in a `fun` +judgement: + +- primitive notion: the default status +- constructor: the function appears on the right-hand side in `data` + judgement +- defined: the function has a `def` definition + +The \"constructor\" and \"defined\" statuses are in contradiction with +each other, whereas the primitive notion status is overridden by any of +the two others. + +This distinction is relevant for the semantics of abstract syntax, not +for concrete syntax. It shows in the way patterns are treated in +equations in `def` definitions: a constructor in a pattern matches only +itself, whereas any other name is treated as a variable pattern, which +matches anything. + + +### Linearization type definitions, lincat + +A linearization type definition, + +`lincat` *C* `=` *T* + +defines the type of linearizations of trees whose type has category *C*. +Type dependences have no effect on the linearization type. + +The type *T* must be a **legal linearization type**, which means that it +is a *record type* whose fields have either parameter types, the type +Str of strings, or table or record types of these. In particular, +function types may not appear in *T*. A detailed explanation of types in +concrete syntax will be given [here](#cnctypes). + +If *K* is the concrete syntax of an abstract syntax *A*, then *K* must +define the linearization type of all categories declared in *A*. +However, the definition can be omitted from the source code, in which +case the default type `{s : Str}` is used. + + +### Linearization definitions, lin + +A linearization definition, + +`lin` *f* `=` *t* + +defines the linearizations function of function *f*, i.e. the function +used for linearizing trees formed by *f*. + +The type of *t* must be the homomorphic image of the type of *f*. In +other words, if + +`fun` *f* `:` *A*~1~ `->` \... `->` *A*~n~ `->` *A* + +then + +`lin` *f* `:` *A*~1~\* `->` \... `->` *A*~n~\* `->` *A*\* + +where the type *T*\* is defined as follows depending on *T*: + +- (*C* *t*~1~ \... *t*~n~)\* = *T*, if `lincat` *C* `=` *T* +- (*B*~1~ `->` \... `->` *B*~m~ `->` *B*)\* = *B*\* + `** {$0,...,$m : Str}` + +The second case is relevant for higher-order functions only. It says +that the linearization type of the value type is extended by adding a +string field for each argument types; these fields store the variable +symbol used for the binding of each variable. + +[]{#HOAS} + +Since the arguments of a function argument are treated as bare strings, +orders higher than the second are irrelevant for concrete syntax. + +There is syntactic sugar for binding the variables of the linearization +of a function on the left-hand side: + +`lin` *f* *p* `=` *t* === `lin` *f* `= \`*p* `->` *t* + +The pattern *p* must be either a variable or a wildcard (`_`); this is +what the syntax of lambda abstracts (`\p -> t`) requires. + + +### Linearization default definitions, lindef + +[]{#lindefjudgements} + +A linearization default definition, + +`lindef` *C* `=` *t* + +defines the default linearization of category *C*, i.e. the function +applicable to a string to make it into an object of the linearization +type of *C*. + +Linearization defaults are invoked when linearizing variable bindings in +higher-order abstract syntax. A variable symbol is then presented as a +string, which must be converted to correct type in order for the +linearization not to fail with an error. + +The other use of the defaults is for linearizing metavariables and +abstract functions without linearization in the concrete syntax. In the +first case the default linearization is applied to the string `"?X"` +where `X` is the unique index of the metavariable, and in the second +case the string is `"[f]"` where `f` is the name of the abstract +function with missing linearization. + +Usually, linearization defaults are generated by using the default rule +that \"uses the symbol itself for every string, and the first value of +the parameter type for every parameter\". The precise definition is by +structural recursion on the type: + +- default(Str,s) = s +- default(P,s) = \#1(P) +- default(P =\> T,s) = `\\_ =>` default(T,s) +- default(`{`\... ; r : R ; \...`}`,s) = `{`\... ; r : default(R,s) ; + \...`}` + +The notion of the first value of a parameter type (\#1(P)) is defined +[below](#paramvalues). + + +### Linearization reference definitions, linref + +[]{#linrefjudgements} + +A linearization reference definition, + +`linref` *C* `=` *t* + +defines the reference linearization of category *C*, i.e. the function +applicable to an object of the linearization type of *C* to make it into +a string. + +The reference linearization is always applied to the top-level node of +the abstract syntax tree. For example when we linearize the tree +`f x1 x2 .. xn`, then we first apply `f` to its arguments which gives us +an object of the linearization type of its category. After that we apply +the reference linearization for the same category to get a string out of +the object. This is particularly useful when the linearization type of +*C* contains discontious constituents. In this case usually the +reference linearization glues the constituents together to produce an +intuitive linearization string. + +The reference linearization is also used for linearizing metavariables +which stand in function position. For example the tree +`f (? x1 x2 .. xn)` is linearized as follows. Each of the arguments +`x1 x2 .. xn` is linearized, and after that the reference linearization +of the its category is applied to the output of the linearization. The +result is a sequence of `n` strings which are concatenated into a single +string. The final string is the input to the default linearization of +the category for the argument of `f`. After applying the default +linearization we get an object that we could safely pass to `f`. + +Usually, linearization references are generated by using the rule that +\"picks the first string in the linearization type\". The precise +definition is by structural recursion on the type: + +- reference(Str,o) = Just o +- reference(P,s) = Nothing +- reference(P =\> T,o) = reference(T,o ! \#1(P)) \|\| reference(T,o ! + \#2(P)) \|\| \... \|\| reference(T,o ! \#n(P)) +- reference({r1 : R1; \... rn : Rn},o) = reference(R1, o.r1) \|\| + reference(R2, o.r2) \|\| \... \|\| reference(Rn, o.rn) + +Here each call to reference returns either `(Just o)` or `Nothing`. When +we compute the reference for a table or a record then we pick the +reference for the first expression for which the recursive call gives us +`Just`. If we get `Nothing` for all of them then the final result is +`Nothing` too. + + +### Printname definitions, printname cat and printname fun + +A category printname definition, + +`printname cat` *C* `=` *s* + +defines the printname of category *C*, i.e. the name used in some +abstract syntax information shown to the user. + +Likewise, a function printname definition, + +`printname fun` *f* `=` *s* + +defines the printname of function *f*, i.e. the name used in some +abstract syntax information shown to the user. + +The most common use of printnames is in the interactive syntax editor, +where printnames are displayed in menus. It is possible e.g. to adapt +them to each language, or to embed HTML tooltips in them (as is used in +some HTML-based editor GUIs). + +Usually, printnames are generated automatically from the symbol and/or +concrete syntax information. + + +### Parameter type definitions, param + +[]{#paramjudgements} + +A parameter type definition, + +`param` *P* `=` *C*~1~ *G*~1~ `|` \... `|` *C*~n~ *G*~n~ + +defines a parameter type *P* with the **parameter constructors** +*C*~1~\...*C*~n~, with their respective contexts *G*~1~\...*G*~n~. + +[]{#paramtypes} + +Contexts have the same syntax as in `cat` judgements, explained +[here](#catjudgements). Since dependent types are not available in +parameter type definitions, the use of variables is never necessary. The +types in the context must themselves be **parameter types**, which are +defined as follows: + +- Given the judgement `param` *P* \..., *P* is a parameter type. +- A record type of parameter types is a parameter type. +- `Ints` *n* (an initial segment of integers) is a parameter type. + +The names defined by a parameter type definition include both the type +name *P* and the constructor names *C*~i~. Therefore all these names +must be distinct in a module. + +A parameter type may not be recursive, i.e. *P* itself may not occur in +the contexts of its constructors. This restriction extends to mutual +recursion: we say that *P* **depends** on the types that occur in the +contexts of its constructors and on all types that those types depend +on, and state that *P* may not depend on itself. + +In an `interface module`, it is possible to declare a parameter type +without defining it, + +`param` *P* `;` + + +### Parameter values + +[]{#paramvalues} + +All parameter types are finite, and the GF compiler will internally +compute them to **lists of parameter values**. These lists are formed by +traversing the `param` definitions, usually respecting the order of +constructors in the source code. For records, bibliographical sorting is +applied. However, both the order of traversal of `param` definitions and +the order of fields in a record are specified in a compiler-internal +way, which means that the programmer should not rely on any particular +order. + +The order of the list of parameter values can affect the program in two +cases: + +- in the default `lindef` definition ([here](#lindefjudgements)), the + first value is chosen +- in course-of-value tables ([here](#tables)), the compiler-internal + order is followed + +The first usage implies that, if `lindef` definitions are essential for +the application, they should be given manually. The second usage implies +that course-of-value tables should be avoided in hand-written GF code. + +In run-time grammar generation, all parameter values are translated to +integers denotions positions in these parameter lists. + + +### Operation definitions, oper + +An operation definition, + +`oper` *h* `:` *T* `=` *t* + +defines an **operation** *h* of type *T*, with the computation rule + +*h* ==\> *t* + +The type *T* can be any concrete syntax type, including function types +of any order. The term *t* must have the type *T*, as defined +[here](#expressions). + +As syntactic sugar, the type can be omitted, + +`oper` *h* `=` *t* + +which works in two cases + +- the type can be inferred from *t* (compiler-dependent) +- the definition occurs in an `instance` and the type is given in the + `interface` + +It is also possible to give the type and the definition separately: + +`oper` *h* `:` *T* ; `oper` *h* `=` *t* === `oper` *h* `:` *T* `=` *t* + +The order of the type part and the definition part is free, and there +can be other judgements in between. However, they must occur in the same +`resource` module for it to be complete (as defined +[here](#completeness)). In an `interface` module, it is enough to give +the type. + +When only the definition is given, it is possible to use a shorthand +similar to `lin` judgements: + +`oper` *h* *p* `=` *t* === `oper` *h* `=` `\`*p* `->` *t* + +The pattern *p* is either a variable or a wildcard (`_`). + +Operation definitions may not be recursive, not even mutually recursive. +This condition ensures that functions can in the end be eliminated from +concrete syntax code (as explained [here](#functionelimination)). + + +### Operation overloading + +[]{#overloading} + +One and the same operation name *h* can be used for different +operations, which have to have different types. For each call of *h*, +the type checker selects one of these operations depending on what type +is expected in the context of the call. The syntax of overloaded +operation definitions is + +`oper` *h* `= overload {`*h* : *T*~1~ = *t*~1~ ; \... ; *h* : *T*~n~ = +*t*~n~`}` + +Notice that *h* must be the same in all cases. This format can be used +to give the complete implementation; to give just the types, e.g. in an +interface, one can use the form + +`oper` *h* `: overload {`*h* : *T*~1~ ; \... ; *h* : *T*~n~`}` + +The implementation of this operation typing is given by a judgement of +the first form. The order of branches need not be the same. + + +### Flag definitions, flags + +A flag definition, + +`flags` *o* `=` *v* + +sets the value of the flag *o*, to be used when compiling or using the +module. + +The flag *o* is an identifier, and the value *v* is either an identifier +or a quoted string. + +Flags are a kind of metadata, which do not strictly belong to the GF +language. For instance, compilers do not necessarily check the +consistency of flags, or the meaningfulness of their values. The +inheritance of flags is not well-defined; the only certain rule is that +flags set in the module body override the settings from inherited +modules. + +Here are some flags commonly included in grammars. + + flag value description module + ------------ -------------------- ---------------------------------- ---------- + `coding` character encoding encoding used in string literals concrete + `startcat` category default target of parsing abstract + +The possible values of these flags are specified [here](#flagvalues). +Note that the `lexer` and `unlexer` flags are deprecated. If you need +their functionality, you should use supply them to GF shell commands +like so: + + put_string -lextext "страви, напої" | parse + +A summary of their possible values can be found at the [GF shell +reference](http://www.grammaticalframework.org/doc/gf-shell-reference.html). + + +Types and expressions +--------------------- + + +### Overview of expression forms + +[]{#expressions} + +Like many dependently typed languages, GF makes no syntactic distinction +between expressions and types. An illegal use of a type as an expression +or vice versa comes out as a type error. Whether a variable, for +instance, stands for a type or an expression value, can only be resolved +from its context of use. + +One practical consequence of the common syntax is that global and local +definitions (`oper` judgements and `let` expressions, respectively) work +in the same way for types and expressions. Thus it is possible to +abbreviate a type occurring in a type expression: + + let A = {s : Str ; b : Bool} in A -> A -> A + +Type and other expressions have a system of **precedences**. The +following table summarizes all expression forms, from the highest to the +lowest precedence. Some expressions are moreover left- or +right-associative. + + prec expression example explanation + --------- ------------------------------------ ----------------------------------- + 7 `c` constant or variable + 7 `Type` the type of types + 7 `PType` the type of parameter types + 7 `Str` the type of strings/token lists + 7 `"foo"` string literal + 7 `123` integer literal + 7 `0.123` floating point literal + 7 `?` metavariable + 7 `[]` empty token list + 7 `[C a b]` list category + 7 `["foo bar"]` token list + 7 `{"s : Str ; n : Num}` record type + 7 `{"s = "foo" ; n = Sg}` record + 7 `` tuple + 7 `` type-annotated expression + 6 left `t.r` projection or qualification + 5 left `f a` function application + 5 `table {Sg => [] ; _ => "xs"}` table + 5 `table P [a ; b ; c]` course-of-values table + 5 `case n of {Sg => [] ; _ => "xs"}` case expression + 5 `variants {"color" ; "colour"}` free variation + 5 `pre {vowel => "an" ; _ => "a"}` prefix-dependent choice + 4 left `t ! v` table selection + 4 left `A * B` tuple type + 4 left `R ** {b : Bool}` record (type) extension + 3 left `t + s` token gluing + 2 left `t ++ s` token list concatenation + 1 right `\x,y -> t` function abstraction (\"lambda\") + 1 right `\\x,y => t` table abstraction + 1 right `(x : A) -> B` dependent function type + 1 right `A -> B` function type + 1 right `P => T` table type + 1 right `let x = v in t` local definition + 1 `t where {x = v}` local definition + 1 `in M.C "foo"` rule by example + +Any expression in parentheses (`(`*exp*`)`) is in the highest precedence +class. + + +### The functional fragment: expressions in abstract syntax + +[]{#functiontype} + +The expression syntax is the same in abstract and concrete syntax, +although only a part of the syntax is actually usable in well-typed +expressions in abstract syntax. An abstract syntax is essentially used +for defining a set of types and a set of functions between those types. +Therefore it needs essentially the **functional fragment** of the +syntax. This fragment comprises two kinds of types: + +- **basic types**, of form *C a1\...an* where + - `cat` *C* (*x*~1~ : *A*~1~)\...(*x*~n~ : *A*~n~), including the + predefined categories `Int`, `Float`, and `String` explained + [here](#predefabs) + - *a*~1~ : *A*~1~,\...,*a*~n~ : *A*~n~{*x*~1~ = + *a*~1~,\...,*x*~n-1~=*a*~n-1~} + + + +- **function types**, of form (*x* : *A*) `->` *B*, where + - *A* is a type + - *B* is a type possibly depending on *x* : *A* + +When defining basic types, we used the notation *t*{*x*~1~ = +*t*~1~,\...,*x*~n~=*t*~n~} for the **substitution** of values to +variables. This is a metalevel notation, which denotes a term that is +formed by replacing the free occurrences of each variable *x*~i~ by +*t*~i~. + +These types have six kinds of expressions: + +- **constants**, *f* : *A* where + - `fun` *f* : *A* + + + +- **literals** for integers, floats, and strings (defined in + [here](#predefabs)) + + + +- **variables**, *x* : *A* where + - *x* has been introduced by a binding + + + +- **applications**, *f a* : *B*{*x*=*a*}, where + - *f* : (*x* : *A*) `->` *B* + - *a* : *A* + + + +- **abstractions**, `\`*x* `->` *b* : (*x* : *A*) `->` *B*, where + - *b* : *B* possibly depending on *x* : *A* + + + +- **metavariables**, `?`, as introduced in intermediate phases of + incremental type checking; metavariables are not permitted in GF + source code + +[]{#variablebinding} + +The notion of **binding** is defined for occurrences of variables in +subexpressions as follows: + +- in (*x* : *A*) `->` *B*, *x* is bound in *B* +- in `\`*x* `->` *b*, *x* is bound in *b* +- in `def` *f* *p*~1~ \... *p*~n~ = *t*, any pattern variable + introduced in any *pi* is bound in *t* (as defined + [here](#patternmatching)) + +As syntactic sugar, function types have sharing of types and suppression +of variables, in the same way as contexts (defined [here](#contexts)): + +- variables can share a type, + `(` *x,y* `:` *A* `)` `->` *B* === `(` *x* `:` *A* `) -> (` *y* `:` + *A* `) ->` *B* +- a **wildcard** can be used for a variable not occurring later in the + type, + `(` `_` `:` *A* `) ->` *B* === `(` *x* `:` *T* `) ->` *B* +- if the variable does not occur later, it can be omitted altogether, + and parentheses are not used, + *A* `->` *B* === `(` *\_* `:` *A* `) ->` *B* + +There is analogous syntactic sugar for constant functions, + +`\`*\_* `->` *t* === `\`*x* `->` *t* + +where *x* does not occur in *t*, and for multiple lambda abstractions: + +`\`*p,q* `->` *t* === `\`*p* `->` `\`*q* `->` *t* + +where *p* and *q* are variables or wild cards (`_`). + + +### Conversions + +Among expressions, there is a relation of **definitional equality** +defined by four **conversion rules**: + +- **alpha conversion**: `\`*x* `->` *b* = `\`*y* `->` *b*{*x*=*y*} + + + +- **beta conversion**: (`\`*x* `->` *b*) *a* = *b*{*x*=*a*} + + + +- **delta conversion**: *f* *a*~1~ \... *a*~n~ = *tg*, if + - there is a definition `def` *f* *p*~1~ \... *p*~n~ = *t* + - this definition is the first for *f* that matches the sequence + *a*~1~ \.... *a*~n~, with the substitution *g* + + + +- **eta conversion**: *c* = `\`*x* `->` *c x*, if *c* : (*x* : *A*) + `->` *B* + +Pattern matching substitution used in delta conversion is defined +[here](#patternmatching). + +An expression is in **beta-eta-normal form** if + +- it has no subexpressions to which beta conversion applies (beta + normality) +- each constant or variable whose type is a function type must be + **eta-expanded**, i.e. made into an abstract equal to it by eta + conversion (eta normality) + +Notice that the iteration of eta expansion would lead to an expression +not in beta-normal form. + + +### Syntax trees + +[]{#syntaxtrees} + +The **syntax trees** defined by an abstract syntax are well-typed +expressions of basic types in beta-eta normal form. Linearization +defined in concrete syntax applies to all and only these expressions. + +There is also a direct definition of syntax trees, which does not refer +to beta and eta conversions: keeping in mind that a type always has the +form + +(*x*~1~ : *A*~1~) `->` \... `->` (*x*~n~ : *A*~n~) `->` *B* + +where *Ai* are types and *B* is a basic type, a syntax tree is an +expression + +*b* *t*~1~ \... *t*~n~ : *B\'* + +where + +- *B\'* is the basic type *B*{*x*~1~ = *t*~1~,\...,*x*~n~ = *t*~n~} +- `fun` *b* : (*x*~1~ : *A*~1~) `->` \... `->` (*x*~n~ : *A*~n~) `->` + *B* +- each *t*~i~ has the form `\`*z*~1~,\...,*z*~m~ `->` *c* where *A*~i~ + is + (*y*~1~ : *B*~1~) `->` \... `->` (*y*~m~ : *B*~m~) `->` *B* + + +### Predefined types in abstract syntax + +[]{#predefabs} + +GF provides three predefined categories for abstract syntax, with +predefined expressions: + + category expressions + ---------- --------------------------------------- + `Int` integer literals, e.g. `123` + `Float` floating point literals, e.g. `12.34` + `String` string literals, e.g. `"foo"` + +These categories take no arguments, and they can be used as basic types +in the same way as if they were introduced in `cat` judgements. However, +it is not legal to define `fun` functions that have any of these types +as value type: their only well-typed expressions are literals as defined +in the above table. + + +### Overview of expressions in concrete syntax + +[]{#cnctypes} + +Concrete syntax is about defining mappings from abstract syntax trees to +**concrete syntax objects**. These objects comprise + +- records +- tables +- strings +- parameter values + +Thus functions are not concrete syntax objects; however, the mappings +themselves are expressed as functions, and the source code of a concrete +syntax can use functions under the condition that they can be eliminated +from the final compiled grammar (which they can; this is one of the +fundamental properties of compilation, as explained in more detail in +the *JFP* article). + +Concrete syntax thus has the same function types and expression forms as +abstract syntax, specified [here](#functiontype). The basic types +defined by categories (`cat` judgements) are available via grammar reuse +explained [here](#reuse); this also comprises the predefined categories +`Float` and `String`. + + +### Values, canonical forms, and run-time variables + +In abstract syntax, the conversion rules fiven [here](#conversions) +define a computational relation among expressions, but there is no +separate notion of a **value** of computation: the value (the end point) +of a computation chain is simply an expression to which no more +conversions apply. In general, we are interested in expressions that +satisfy the conditions of being syntax trees (as defined +[here](#syntaxtrees)), but there can be many computationally equivalent +syntax trees which nonetheless are distinct syntax trees and hence have +different linearizations. The main use of computation in abstract syntax +is to compare types in dependent type checking. + +In concrete syntax, the notion of values is central. At run time, we +want to compute the values of linearizations; at compile time, we want +to perform **partial evaluation**, which computes expressions as far as +possible. To specify what happens in computation we therefore have to +distinguish between **canonical forms** and other forms of expressions. +The canonical forms are defined separately for each form of type, +whereas the other forms may usually produce expressions of any type. + +[]{#linexpansion} []{#runtimevariables} + +What is done at compile time is the elimination of any noncanonical +forms, except for those depending on **run-time variables**. Run-time +variables are the same as the **argument variables** of linearization +rules, i.e. the variables *x*~1~,\...,*x*~n~ in + +`lin` *f* `= \` *x*~1~,\...,*x*~n~ `->` *t* + +where + +`fun` *f* `:` (*x*~1~ : *A*~1~) `->` \... `->` (*x*~n~ : *A*~n~) `->` +*B* + +Notice that this definition refers to the **eta-expanded** linearization +term, which has one abstracted variable for each argument type of *f*. +These variables are not necessarily explicit in GF source code, but +introduced by the compiler. + +Since certain expression forms should be eliminated in compilation but +cannot be eliminated if run-time variables appear in them, errors can +appear late in compilation. This is an issue with the following +expression forms: + +- gluing (`s + t`), defined [here](#gluing) +- pattern matching on strings, defined [here](#patternmatching) +- predefined string operations, defined [here](#predefcnc) (those + taking `Str` arguments) + + +### Token lists, tokens, and strings + +[]{#strtype} + +The most prominent basic type is `Str`, the type of **token lists**. +This type is often sloppily referred to as the type of **strings**; but +it should be kept in mind that the objects of `Str` are *lists* of +strings rather than single strings. + +Expressions of type `Str` have the following canonical forms: + +- **tokens**, i.e. **string literals**, in double quotes, e.g. `"foo"` +- **the empty token list**, `[]` +- **concatenation**, *s* `++` *t*, where *s,t* : `Str` +- **prefix-dependent choice**, + `pre {p1 => s1 ; ... ; pn => sn ; _ => s }`, where + - *s*, *s*~1~,\...,*s*~n~, *p*~1~,\...,*p*~n~ : `Str` + +For convenience, the notation is overloaded so that tokens are +identified with singleton token lists, and there is no separate type of +tokens (this is a change from the *JFP* article). The notion of a token +is still important for compilation: all tokens introduced by the grammar +must be known at compile time. This, in turn, is required by the parsing +algorithms used for parsing with GF grammars. + +In addition to string literals, tokens can be formed by a specific +non-canonical operator: + +- **gluing**, *s* `+` *t*, where *s,t* : `Str` + +[]{#gluing} + +Being noncanonical, gluing is equipped with a computation rule: string +literals are glued by forming a new string literal, and empty token +lists can be ignored: + +- `"foo" + "bar"` ==\> `"foobar"` +- *t* `+ []` ==\> *t* +- `[] +` *t* ==\> *t* + +Since tokens must be known at compile time, the operands of gluing may +not depend on run-time variables, as defined [here](#runtimevariables). + +As syntactic sugar, token lists can be given as bracketed string +literals, where spaces separate tokens: + +- **token lists**, `["one two three"]` === `"one" ++ "two" ++ "three"` + +Notice that there are no empty tokens, but the expression `[]` can be +used in a context requiring a token, in particular in gluing expression +below. Since `[]` denotes an empty token list, the following computation +laws are valid: + +- *t* `++ []` ==\> *t* +- `[] ++` *t* ==\> *t* + +Moreover, concatenation and gluing are associative: + +- s `+` (t `+` u) ==\> s `+` t `+` u +- s `++` (t `++` u) ==\> s `++` t `++` u + +For the programmer, associativity and the empty token laws mean that the +compiler can use them to simplify string expressions. It also means that +these laws are respected in pattern matching on strings. + +A prime example of prefix-dependent choice operation is the following +approximative expression for the English indefinite article: + + pre { + ("a" | "e" | "i" | "o") => "an" ; + _ => "a" + } ; + +This expression can be computed in the context of a subsequent token: + +- `pre {p1 => s1 ; ... ; pn => sn ; _ => s } ++ t` ==\> + - *s*~i~ for the first *i* such that the prefix *p*~i~ matches + *t*, if it exists + - *s* otherwise + +The **matching prefix** is defined by comparing the string with the +prefix of the token. If the prefix is a variant list of strings, then it +matches the token if any of the strings in the list matches it. + +The computation rule can sometimes be applied at compile time, but it +general, prefix-dependent choices need to be passed to the run-time +grammar, because they are not given a subsequent token to compare with, +or because the subsequent token depends on a run-time variable. + +The prefix-dependent choice expression itself may not depend on run-time +variables. + +*There is an older syntax for prefix-dependent choice, namely: +`pre { s ; s1 / p1 ; ... ; sn / pn}`. This syntax will not accept +strings as patterns.* + +*In GF prior to 3.0, a specific type* `Strs` *is used for defining +prefixes,* *instead of just* `variants` *of* `Str`. + + +### Records and record types + +A **record** is a collection of objects of possibly different types, +accessible by **projections** from the record with **labels** pointing +to these objects. A record is also itself an object, whose type is a +**record type**. Record types have the form + +`{` *r*~1~ : *A*~1~ `;` \... `;` *r*~n~ : *A*~n~ `}` + +where *n* \>= 0, each *A*~i~ is a type, and the labels *r*~i~ are +distinct. A record of this type has the form + +`{` *r*~1~ = *a*~1~ `;` \... `;` *r*~n~ = *a*~n~ `}` + +where each \#aii : \"Aii. A limiting case is the **empty record type** +`{}`, which has the object `{}`, the **empty record**. + +The **fields** of a record type are its parts of the form *r* : *A*, +also called **typings**. The **fields** of a record are of the form *r* += *a*, also called **value assignments**. Value assignments may +optionally indicate the type, as in *r* : *A* = *a*. + +The order of fields in record types and records is insignificant: two +record types (or records) are equal if they have the same fields, in any +order, and a record is an object of a record type, if it has +type-correct value assignments for all fields of the record type. The +latter definition implies the even stronger principle of **record +subtyping**: a record can have any type that has some subset of its +fields. This principle is explained further [here](#subtyping). + +All fields in a record must have distinct labels. Thus it is not +possible e.g. to \"redefine\" a field \"later\" in a record. + +Lexically, labels are identifiers (defined [here](#identifiers)). This +is with the exception of the labels selecting bound variables in the +linearization of higher-order abstract syntax, which have the form +`$`*i* for an integer *i*, as specified [here](#HOAS). In source code, +these labels should not appear in records fields, but only in +selections. + +Labels occur only in syntactic positions where they cannot be confused +with constants or variables. Therefore it is safe to write, as in +`Prelude`, + + ss : Str -> {s : Str} = \s -> {s = s} ; + +A **projection** is an expression of the form + +*t*.*r* + +where *t* must be a record and *r* must be a label defined in it. The +type of the projection is the type of that field. The computation rule +for projection returns the value assigned to that field: + +`{` \... `;` *r* = *a* `;` \... `}.`*r* ==\> *a* + +Notice that the dot notation *t*.*r* is also used for qualified names as +specified [here](#qualifiednames). This ambiguity follows tradition and +convenience. It is resolved by the following rules (before type +checking): + +1. if *t* is a bound variable or a constant in scope, *t*.*r* is + type-checked as a projection +2. otherwise, *t*.*r* is type-checked as a qualified name + +As syntactic sugar, types and values can be shared: + +- `{` \... `;` *r,s* : *A* `;` \... `}` === `{` \... `;` *r* : *A* `;` + *s* : *A* `;` \... `}` +- `{` \... `;` *r,s* = *a* `;` \... `}` === `{` \... `;` *r* = *a* `;` + *s* = *a* `;` \... `}` + +Another syntactic sugar are **tuple types** and **tuples**, which are +translated by endowing their unlabelled fields by the labels `p1`, +`p2`,\... in the order of appearance of the fields: + +- *A*~1~ `*` \... `*` *A*~n~ === `{` `p1` : *A*~1~ `;` \... `;` `pn` : + *A*~n~ `}` +- `<`*a*~1~ `,` \... `,` *a*~n~ `>` === `{` `p1` = *a*~1~`;` \... `;` + `pn` = *a*~n~ `}` + +A **record extension** is formed by adding fields to a record or a +record type. The general syntax involves two expressions, + +*R* `**` *S* + +The result is a record type or a record with a union of the fields of +*R* and *S*. It is therefore well-formed if + +- both *R* and *S* are either records or record types +- the labels in *R* and *S* are disjoint, if *R* and *S* are record + types + +(Since GF version 3.6) If *R* and *S* are record objects, then the +labels in them need not be disjoint. Labels defined in *S* are then +given priority, so that record extensions in fact works as **record +update**. A common pattern of using this feature is + + lin F x ... = x ** {r = ... x.r ...} + +where `x` is a record with many fields, just one of which is updated. +Following the normal binding conditions, `x.r` on the right hand side +still refers to the old value of the `r` field. + + +### Subtyping + +The possibility of having superfluous fields in a record forms the basis +of the **subtyping** relation. That *A* is a subtype of *B* means that +*a : A* implies *a : B*. This is clearly satisfied for records with +superfluous fields: + +- if *R* is a record type without the label *r*, then *R* `** {` *r* : + *A* `}` is a subtype of *R* + +The GF grammar compiler extends subtyping to function types by +**covariance** and **contravariance**: + +- covariance: if *A* is a subtype of *B*, then *C* `->` *A* is a + subtype of *C* `->` *B* +- contravariance: if *A* is a subtype of *B*, then *B* `->` *C* is a + subtype of *A* `->` *C* + +The logic of these rules is natural: if a function is returns a value in +a subtype, then this value is *a fortiori* in the supertype. If a +function is defined for some type, then it is *a fortiori* defined for +any subtype. + +In addition to the well-known principles of record subtyping and co- and +contravariance, GF implements subtyping for initial segments of +integers: + +- if *m* \< *n*, then `Ints` *m* is a subtype of `Ints` *n* +- `Ints` *n* is a subtype of `Integer` + +As the last rule, subtyping is transitive: + +- if *A* is a subtype of *B* and *B* is a subtype of *C*, then *A* is + a subtype of *C*. + + +### Tables and table types + +[]{#tables} + +One of the most characteristic constructs of GF is **tables**, also +called **finite functions**. That these functions are finite means that +it is possible to finitely enumerate all argument-value pairs; this, in +turn, is possible because the argument types are finite. + +A **table type** has the form + +*P* `=>` *T* + +where *P* must be a parameter type in the sense defined +[here](#paramtypes), whereas *T* can be any type. + +Canonical expressions of table types are **tables**, of the form + +`table` `{` *V*~1~ `=>` *t*~1~ ; \... ; *V*~n~ `=>` *t*~n~ `}` + +where *V*~1~,\...,*V*~n~ is the complete list of the parameter values of +the argument type *P* (defined [here](#paramvalues)), and each *t*~i~ is +an expression of the value type *T*. + +In addition to explicit enumerations, tables can be given by **pattern +matching**, + +`table` `{`*p*~1~ `=>` *t*~1~ ; \... ; *p*~m~ `=>` *t*~m~`}` + +where *p*~1~,\....,*p*~m~ is a list of patterns that covers all values +of type *P*. Each pattern *p*~i~ may bind some variables, on which the +expression *t*~i~ may depend. A complete account of patterns and pattern +matching is given [here](#patternmatching). + +A **course-of-values table** omits the patterns and just lists all +values. It uses the enumeration of all values of the argument type *P* +to pair the values with arguments: + +`table` *P* `[`*t*~1~ ; \... ; *t*~n~`]` + +This format is not recommended for GF source code, since the ordering of +parameter values is not specified and therefore a compiler-internal +decision. + +The argument type can be indicated in ordinary tables as well, which is +sometimes helpful for type inference: + +`table` *P* `{` \... `}` + +The **selection** operator `!`, applied to a table *t* and to an +expression *v* of its argument type + +*t* `!` *v* + +returns the first pattern matching result from *t* with *v*, as defined +[here](#patternmatching). The order of patterns is thus significant as +long as the patterns contain variables or wildcards. When the compiler +reorders the patterns following the enumeration of all values of the +argument type, this order no longer matters, because no overlap remains +between patterns. + +The GF compiler performs **table expansion**, i.e. an analogue of eta +expansion defined [here](#conversions), where a table is applied to all +values to its argument type: + +*t* : *P* `=>` *T* ==\> `table` *P* `[`*t* `!` *V*~1~ ; \... ; *t* `!` +*V*~n~`]` + +As syntactic sugar, one-branch tables can be written in a way similar to +lambda abstractions: + +`\\`*p* `=>` *t* === `table {`*p* `=>` *t* `}` + +where *p* is either a variable or a wildcard (`_`). Multiple bindings +can be abbreviated: + +`\\`*p,q* `=>` *t* === `\\`*p* `=>` `\\`*q* `=>` *t* + +**Case expressions** are syntactic sugar for selections: + +`case` *e* `of {`\...`}` === `table {`\...`} !` *e* + + +### Pattern matching + +[]{#patternmatching} + +We will list all forms of patterns that can be used in table branches. +We define their **variable bindings** and **matching substitutions**. + +We start with the patterns available for all parameter types, as well as +for the types `Integer` and `Str`. + +- A constructor pattern *C* *p*~1~\...*p*~n~ binds the union of all + variables bound in the subpatterns *p*~1~,\...,*p*~n~. It matches + any value *C* *V*~1~\...*V*~n~ where each *p*~i~\# matches *V*~i~, + and the matching substitution is the union of these substitutions. +- A record pattern `{` *r*~1~ `=` *p*~1~ `;` \... `;` *r*~n~ `=` + *p*~n~ `}` binds the union of all variables bound in the subpatterns + *p*~1~,\...,*p*~n~. It matches any value `{` *r*~1~ `=` *V*~1~ `;` + \... `;` *r*~n~ `=` *V*~n~ `;` \...`}` where each *p*~i~\# matches + *V*~i~, and the matching substitution is the union of these + substitutions. +- A variable pattern *x* (identifier other than parameter constructor) + binds the variable *x*. It matches any value *V*, with the + substitution {*x* = *V*}. +- The wild card `_` binds no variables. It matches any value, with the + empty substitution. +- A disjunctive pattern *p* `|` *q* binds the intersection of the + variables bound by *p* and *q*. It matches anything that either *p* + or *q* matches, with the first substitution starting with *p* + matches, from which those variables that are not bound by both + patterns are removed. +- A negative pattern `-` *p* binds no variables. It matches anything + that *p* does *not* match, with the empty substitution. +- An alias pattern *x* `@` *p* binds *x* and all the variables bound + by *p*. It matches any value *V* that *p* matches, with the same + substition extended by {*x* = *V*}. + +The following patterns are only available for the type `Str`: + +- A string literal pattern, e.g. `"s"`, binds no variables. It matches + the same string, with the empty substitution. +- A concatenation pattern, *p* `+` *q*, binds the union of variables + bound by *p* and *q*. It matches any string that consists of a + prefix matching *p* and a suffix matching *q*, with the union of + substitutions corresponding to the first match (see below). +- A repetition pattern *p*`*` binds no variables. It matches any + string that can be decomposed into strings that match *p*, with the + empty substitution. + +The following pattern is only available for the types `Integer` and +`Ints` *n*: + +- An integer literal pattern, e.g. `214`, binds no variables. It + matches the same integer, with the empty substitution. + +All patterns must be **linear**: the same pattern variable may occur +only once in them. This is what makes it straightforward to speak about +unions of binding sets and substitutions. + +Pattern matching is performed in the order in which the branches appear +in the source code: the branch of the first matching pattern is +followed. In concrete syntax, the type checker reject sets of patterns +that are not exhaustive, and warns for completely overshadowed patterns. +It also checks the type correctness of patterns with respect to the +argument type. In abstract syntax, only type correctness is checked, no +exhaustiveness or overshadowing. + +It follows from the definition of record pattern matching that it can +utilize partial records: the branch + + {g = Fem} => t + +in a table of type `{g : Gender ; n : Number} => T` means the same as + + {g = Fem ; n = _} => t + +Variables in regular expression patterns are always bound to the **first +match**, which is the first in the sequence of binding lists. For +example: + +- `x + "e" + y` matches `"peter"` with `x = "p", y = "ter"` +- `x + "er"*` matches `"burgerer"` with `x = "burg"` + + +### Free variation + +An expressions of the form + +`variants` `{`*t*~1~ ; \... ; *t*~n~`}` + +where all *t*~i~ are of the same type *T*, has itseld type *T*. This +expression presents *t*~i~,\...,*t*~n~ as being in **free variation**: +the choice between them is not determined by semantics or parameters. A +limiting case is + +`variants {}` + +which encodes a rule saying that there is no way to express a certain +thing, e.g. that a certain inflectional form does not exist. + +A common wisdom in linguistics is that \"there is no free variation\", +which refers to the situation where *all* aspects are taken into +account. For instance, the English negation contraction could be +expressed as free variation, + + variants {"don't" ; "do" ++ "not"} + +if only semantics is taken into account, but if stylistic aspects are +included, then the proper formulation might be with a parameter +distinguishing between informal and formal style: + + case style of {Informal => "don't" ; Formal => "do" ++ "not"} + +Since there is not way to choose a particular element from a +\`\`variants\` list, free variants is normally not adequate in +libraries, nor in grammars meant for natural language generation. In +application grammars meant to parse user input, free variation is a way +to avoid cluttering the abstract syntax with semantically insignificant +distinctions and even to tolerate some grammatical errors. + +Permitting `variants` in all types involves a major modification of the +semantics of GF expressions. All computation rules have to be lifted to +deal with lists of expressions and values. For instance, + +*t* `!` `variants` `{`*t*~1~ ; \... ; *t*~n~`}` ==\> `variants` `{`*t* +`!` *t*~1~ ; \... ; *t* `!` *t*~n~`}` + +This is done in such a way that variation does not distribute to records +(or other product-like structures). For instance, variants of records, + + variants {{s = "Auto" ; g = Neutr} ; {s = "Wagen" ; g = Masc}} + +is *not* the same as a record of variants, + + {s = variants {"Auto" ; "Wagen"} ; g = variants {Neutr ; Masc}} + +Variants of variants are flattened, + +`variants` `{`\...; `variants` `{`*t*~1~ ;\...; *t*~n~`}` ;\...`}` ==\> +`variants` `{`\...; *t*~1~ ;\...; *t*~n~ ;\...`}` + +and singleton variants are eliminated, + +`variants` `{`*t*`}` ==\> *t* + + +### Local definitions + +A **local definition**, i.e. a **let expression** has the form + +`let` *x* : *T* = *t* `in` *e* + +The type of *x* must be *T*, which also has to be the type of *t*. +Computation is performed by substituting *t* for *x* in *e*: + +`let` *x* : *T* = *t* `in` *e* ==\> *e* {*x* = *t*} + +As syntactic sugar, the type can be omitted if the type checker is able +to infer it: + +`let` *x* = *t* `in` *e* + +It is possible to compress several local definitions into one block: + +`let` *x* : *T* = *t* `;` *y* : *U* = *u* `in` *e* === `let` *x* : *T* = +*t* `in` `let` *y* : *U* = *u* `in` *e* + +Another notational variant is a definition block appearing after the +main expression: + +*e* `where` `{`\...`}` === `let` `{`\...`}` `in` *e* + +Curly brackets are obligatory in the `where` form, and can also be +optionally used in the `let` form. + +Since a block of definitions is treated as syntactic sugar for a nested +`let` expression, a constant must be defined before it is used: the +scope is not mutual, as in a module body. Furthermore, unlike in `lin` +and `oper` definitions, it is *not* possible to bind variables on the +left of the equality sign. + + +### Function applications in concrete syntax + +[]{#functionelimination} + +Fully compiled concrete syntax may not include expressions of function +types except on the outermost level of `lin` rules, as defined +[here](#linexpansion). However, in the source code, and especially in +`oper` definitions, functions are the main vehicle of code reuse and +abstraction. Thus function types and functions follow the same rules as +in abstract syntax, as specified [here](#functiontype). In particular, +the application of a lambda abstract is computed by beta conversion. + +To ensure the elimination of functions, GF uses a special computation +rule for pushing function applications inside tables, since otherwise +run-time variables could block their applications: + +(`table` `{`*p*~1~ `=>` *f*~1~ ; \... ; *p*~n~ `=>` *f*~n~ `}` `!` *e*) +*a* ==\> `table` `{`*p*~1~ `=>` *f*~1~ *a* ; \... ; *p*~n~ `=>` *f*~n~ +*a*`}` `!` *e* + +Also parameter constructors with non-empty contexts, as defined +[here](#paramjudgements), result in expressions in application form. +These expressions are never a problem if their arguments are just +constructors, because they can then be translated to integers +corresponding to the position of the expression in the enumaration of +the values of its type. However, a constructor applied to a run-time +variable may need to be converted as follows: + +*C*\...*x*\... ==\> `case` *x* of `{_ =>` *C*\...*x*`}` + +The resulting expression, when processed by table expansion as explained +[here](#tables), results in *C* being applied to just values of the type +of *x*, and the application thereby disappears. + + +### Reusing top-level grammars as resources + +[]{#reuse} + +*This section is valid for GF 3.0, which abandons the \"lock field\"* +*discipline of GF 2.8.* + +As explained [here](#openabstract), abstract syntax modules can be +opened as interfaces and concrete syntaxes as their instances. This +means that judgements are, as it were, translated in the following way: + +- `cat` *C* *G* ===\> `oper` *C* : `Type` +- `fun` *f* : *T* ===\> `oper` *f* : *T* +- `lincat` *C* = *T* ===\> `oper` *C* : `Type` = *C* +- `lin` *f* = *t* ===\> `oper` *f* = *t* + +Notice that the value *T* of `lincat` definitions is not disclosed in +the translation. This means that the type *C* remains abstract: the only +ways of building an object of type *C* are the operations *f* obtained +from *fun* and *lin* rules. + +The purpose of keeping linearization types abstract is to enforce +**grammar checking via type checking**. This means that any well-typed +operation application is also well-typed in the sense of the original +grammar. If the types were disclosed, then we could for instance easily +confuse all categories that have the linearization type `{s : Str}`. Yet +another reason is that revealing the types makes it impossible for the +library programmers to change their type definitions afterwards. + +Library writers may occasionally want to have access to the values of +linearization types. The way to make it possible is to add an extra +construction operation to a module in which the linearization type is +available: + + oper MkC : T -> C = \x -> x + +In object-oriented terms, the type *C* itself is **protected**, whereas +*MkC* is a **public constructor** of *C*. Of course, it is possible to +make these constructors overloaded (concept explained +[here](#overloading)), to enable easy access to special cases. + + +### Predefined concrete syntax types + +[]{#predefcnc} + +The following concrete syntax types are predefined: + +- `Str`, the type of tokens and token lists (defined [here](#strtype)) +- `Integer`, the type of nonnegative integers +- `Ints` *n*, the type of integers from *0* to *n* +- `Type`, the type of (concrete syntax) types +- `PType`, the type of parameter types + +The last two types are, in a way, extended by user-written grammars, +since new parameter types can be defined in the way shown +[here](#paramjudgements), and every paramater type is also a type. From +the point of view of the values of expressions, however, a `param` +declaration does not extend `PType`, since all parameter types get +compiled to initial segments of integers. + +Notice the difference between the concrete syntax types `Str` and +`Integer` on the one hand, and the abstract syntax categories `String` +and `Int`, on the other. As *concrete syntax* types, the latter are +treated in the same way as any reused categories: their objects can be +formed by using syntax trees (string and integer literals). + +*The type name* `Integer` *replaces in GF 3.0 the name* `Int`, *to avoid +confusion with the abstract syntax type and to be analogous* *with the* +`Str` *vs.* `String` *distinction.* + + +### Predefined concrete syntax operations + +The following predefined operations are defined in the resource module +`prelude/Predef.gf`. Their implementations are defined as a part of the +GF grammar compiler. + + ------------------------------------------------------------------------------------------------- + operation type explanation + -------------- --------------------------------- ------------------------------------------------ + `PBool` `PType` `PTrue | PFalse` + + `Error` `Type` the empty type + + `Int` `Type` the type of integers + + `Ints` `Integer -> Type` the type of integers from 0 to n + + `error` `Str -> Error` forms error message + + `length` `Str -> Int` length of string + + `drop` `Integer -> Str -> Str` drop prefix of length + + `take` `Integer -> Str -> Str` take prefix of length + + `tk` `Integer -> Str -> Str` drop suffix of length + + `dp` `Integer -> Str -> Str` take suffix of length + + `eqInt` `Integer -> Integer -> PBool` test if equal integers + + `lessInt` `Integer -> Integer -> PBool` test order of integers + + `plus` `Integer -> Integer -> Integer` add integers + + `eqStr` `Str -> Str -> PBool` test if equal strings + + `occur` `Str -> Str -> PBool` test if occurs as substring + + `occurs` `Str -> Str -> PBool` test if any char occurs + + `show` `(P : Type) -> P -> Str` convert param to string + + `read` `(P : Type) -> Str -> P` convert string to param + + `toStr` `(L : Type) -> L -> Str` find the \"first\" string + + `nonExist` `Str` a special token marking\ + non-existing morphological forms + + `BIND` `Str` a special token marking\ + that the surrounding tokens should not\ + be separated by space + + `SOFT_BIND` `Str` a special token marking\ + that the surrounding tokens may not\ + be separated by space + + `SOFT_SPACE` `Str` a special token marking\ + that the space between the surrounding tokens\ + is optional + + `CAPIT` `Str` a special token marking\ + that the first character in the next token\ + should be capitalized + + `ALL_CAPIT` `Str` a special token marking\ + that the next word should be\ + in all capital letters + ------------------------------------------------------------------------------------------------- + +Compilation eliminates these operations, and they may therefore not take +arguments that depend on run-time variables. + +The module `Predef` is included in the *opens* list of all modules, and +therefore does not need to be opened explicitly. + + +Flags and pragmas +----------------- + + +### Some flags and their values + +[]{#flagvalues} + +The flag `coding` in concrete syntax sets the **character encoding** +used in the grammar. Internally, GF uses unicode, and `.pgf` files are +always written in UTF8 encoding. The presence of the flag `coding=utf8` +prevents GF from encoding an already encoded file. + +The flag `startcat` in abstract syntax sets the default start category +for parsing, random generation, and any other grammar operation that +depends on category. Its legal values are the categories defined or +inherited in the abstract syntax. + + +### Compiler pragmas + +**Compiler pragmas** are a special form of comments prefixed with `--#`. +Currently GF interprets the following pragmas. + + pragma explanation + -------------- --------------------------------- + `-path=`PATH path list for searching modules + +For instance, the line + + --# -path=.:present:prelude:/home/aarne/GF/tmp + +in the top of `FILE.gf` causes the GF compiler, when invoked on +`FILE.gf`, to search through the current directory (`.`) and the +directories `present`, `prelude`, and `/home/aarne/GF/tmp`, in this +order. If a directory `DIR` is not found relative to the working +directory, `$(GF_LIB_PATH)/DIR` is searched. `$GF_LIB_PATH` can be a +colon-separated list of directories, in which case each directory in the +list contributes to the search path expansion. + + +Alternative grammar input formats +--------------------------------- + +While the GF language as specified in this document is the most +versatile and powerful way of writing GF grammars, there are several +other formats that a GF compiler may make available for users, either to +get started with small grammars or to semiautomatically convert grammars +from other formats to GF. Here are the ones supported by GF 2.8 and 3.0. + + +### Old GF without modules + +[]{#oldgf} + +Before GF compiler version 2.0, there was no module system, and all +kinds of judgement could be written in all files, without any headers. +This format is still available, and the compiler (version 2.8) detects +automatically if a file is in the current or the old format. However, +the old format is not recommended because of pure modularity and missing +separate compilation, and also because libraries are not available, +since the old and the new format cannot be mixed. With version 2.8, +grammars in the old format can be converted to modular grammar with the +command + + > import -o FILE.gf + +which rewrites the grammar divided into three files: an abstract, a +concrete, and a resource module. + + +### Context-free grammars + +A quick way to write a GF grammar is to use the context-free format, +also known as BNF. Files of this form are recognized by the suffix +`.cf`. Rules in these files have the form + +*Label* `.` *Cat* `::=` (*String* \| *Cat*)\* `;` + +where *Label* and *Cat* are identifiers and *String* quoted strings. + +There is a shortcut form generating labels automatically, + +*Cat* `::=` (*String* \| *Cat*)\* `;` + +In the shortcut form, vertical bars (`|`) can be used to give several +right-hand-sides at a time. An empty right-hand side means the singleton +of an empty sequence, and not an empty union. + +Just like old-style GF files (previous section), contex-free grammar +files can be converted to modular GF by using the `-o` option to the +compiler in GF 2.8. + + +### Extended BNF grammars + +Extended BNF (`FILE.ebnf`) goes one step further from the shortcut +notation of previous section. The rules have the form + +*Cat* `::=` *RHS* `;` + +where an *RHS* can be any regular expression built from quoted strings +and category symbols, in the following ways: + + RHS item explanation + ----------------- ------------------------ + *Cat* nonterminal + *String* terminal + *RHS* *RHS* sequence + *RHS* `|` *RHS* alternatives + *RHS* `?` optional + *RHS* `*` repetition + *RHS* `+` non-empty repetition\| + +Parentheses are used to override standard precedences, where `|` binds +weaker than sequencing, which binds weaker than the unary operations. + +The compiler generates not only labels, but also new categories +corresponding to the regular expression combinations actually in use. + +Just like `.cf` files (previous section), `.ebnf` files can be converted +to modular GF by using the `-o` option to the compiler in GF 2.8. + + +### Example-based grammars + +**Example-based grammars** (`.gfe`) provide a way to use resource +grammar libraries without having to know the names of functions in them. +The compiler works as a preprocessor, saving the result in a (`.gf`) +file, which can be compiled as usual. + +If a library is implemented as an abstract and concrete syntax, it can +be used for parsing. Calls of library functions can therefore be formed +by parsing strings in the library. GF has an expression format for this, + +`in` *C* *String* + +where *C* is the category in which to parse (it can be qualified by the +module name) and the string is the input to parser. Expressions of this +form are replaced by the syntax trees that result. These trees are +always type-correct. If several parses are found, all but the first one +are given in comments. + +Here is an example, from `GF/examples/animal/`: + + --# -resource=../../lib/present/LangEng.gfc + --# -path=.:present:prelude + + incomplete concrete QuestionsI of Questions = open Lang in { + lincat + Phrase = Phr ; + Entity = N ; + Action = V2 ; + lin + Who love_V2 man_N = in Phr "who loves men" ; + Whom man_N love_V2 = in Phr "whom does the man love" ; + Answer woman_N love_V2 man_N = in Phr "the woman loves men" ; + } + +The `resource` pragma shows the grammar that is used for parsing the +examples. + +Notice that the variables `love_V2`, `man_N`, etc, are actually +constants in the library. In the resulting rules, such as + + lin Whom = \man_N -> \love_V2 -> + PhrUtt NoPConj (UttQS (UseQCl TPres ASimul PPos + (QuestSlash whoPl_IP (SlashV2 (DetCN (DetSg (SgQuant + DefArt)NoOrd)(UseN man_N)) love_V2)))) NoVoc ; + +those constants are nonetheless treated as variables, following the +normal binding conventions, as stated [here](#renaming). + + +The grammar of GF +----------------- + +The following grammar is actually used in the parser of GF, although we +have omitted some obsolete rules still included in the parser for +backward compatibility reasons. + +This document was automatically generated by the *BNF-Converter*. It was +generated together with the lexer, the parser, and the abstract syntax +module, which guarantees that the document matches with the +implementation of the language (provided no hand-hacking has taken +place). + + +The lexical structure of GF +--------------------------- + + +### Identifiers + +Identifiers *Ident* are unquoted strings beginning with a letter, +followed by any combination of letters, digits, and the characters `_ '` +reserved words excluded. + + +### Literals + +Integer literals *Integer* are nonempty sequences of digits. + +String literals *String* have the form `"`*x*`"`}, where *x* is any +sequence of any characters except `"` unless preceded by `\`. + +Double-precision float literals *Double* have the structure indicated by +the regular expression `digit+ '.' digit+ ('e' ('-')? digit+)?` i.e. +two sequences of digits separated by a decimal point, optionally +followed by an unsigned or negative exponent. + + +### Reserved words and symbols + +The set of reserved words is the set of terminals appearing in the +grammar. Those reserved words that consist of non-letter characters are +called symbols, and they are treated in a different way from those that +are similar to identifiers. The lexer follows rules familiar from +languages like Haskell, C, and Java, including longest match and spacing +conventions. + +The reserved words used in GF are the following: + +- `PType` +- `Str` +- `Strs` +- `Type` +- `abstract` +- `case` +- `cat` +- `concrete` +- `data` +- `def` +- `flags` +- `fun` +- `in` +- `incomplete` +- `instance` +- `interface` +- `let` +- `lin` +- `lincat` +- `lindef` +- `linref` +- `of` +- `open` +- `oper` +- `param` +- `pre` +- `printname` +- `resource` +- `strs` +- `table` +- `transfer` +- `variants` +- `where` +- `with` + +The symbols used in GF are the following: + +- `;` +- `=` +- `:` +- `->` +- `{` +- `}` +- `**` +- `,` +- `(` +- `)` +- `[` +- `]` +- `-` +- `.` +- `|` +- `?` +- `<` +- `>` +- `@` +- `!` +- `*` +- `+` +- `++` +- `\` +- `=>` +- `_` +- `$` +- `/` + +### Comments + +Single-line comments begin with `--`. +Multiple-line comments are enclosed with `{-` and `-}`. + + +The syntactic structure of GF +----------------------------- + +Terminal appear as `code`. +The symbols **->** (production), **|** (union) and **eps** (empty rule) belong to the BNF notation. +All other symbols are non-terminals. + + --------------------- -------- ------------------------------------------------------------------------------------ + *Grammar* **->** *\[ModDef\]* + *\[ModDef\]* **->** **eps** + **|** *ModDef* *\[ModDef\]* + *ModDef* **->** *ModDef* `;` + **|** *ComplMod* *ModType* `=` *ModBody* + *ModType* **->** `abstract` *Ident* + **|** `resource` *Ident* + **|** `interface` *Ident* + **|** `concrete` *Ident* `of` *Ident* + **|** `instance` *Ident* `of` *Ident* + **|** `transfer` *Ident* `:` *Open* `->` *Open* + *ModBody* **->** *Extend* *Opens* `{` *\[TopDef\]* `}` + **|** *\[Included\]* + **|** *Included* `with` *\[Open\]* + **|** *Included* `with` *\[Open\]* `**` *Opens* `{` *\[TopDef\]* `}` + **|** *\[Included\]* `**` *Included* `with` *\[Open\]* + **|** *\[Included\]* `**` *Included* `with` *\[Open\]* `**` *Opens* `{` *\[TopDef\]* `}` + *\[TopDef\]* **->** **eps** + **|** *TopDef* *\[TopDef\]* + *Extend* **->** *\[Included\]* `**` + **|** **eps** + *\[Open\]* **->** **eps** + **|** *Open* + **|** *Open* `,` *\[Open\]* + *Opens* **->** **eps** + **|** `open` *\[Open\]* `in` + *Open* **->** *Ident* + **|** `(` *QualOpen* *Ident* `)` + **|** `(` *QualOpen* *Ident* `=` *Ident* `)` + *ComplMod* **->** **eps** + **|** `incomplete` + *QualOpen* **->** **eps** + *\[Included\]* **->** **eps** + **|** *Included* + **|** *Included* `,` *\[Included\]* + *Included* **->** *Ident* + **|** *Ident* `[` *\[Ident\]* `]` + **|** *Ident* `-` `[` *\[Ident\]* `]` + *Def* **->** *\[Name\]* `:` *Exp* + **|** *\[Name\]* `=` *Exp* + **|** *Name* *\[Patt\]* `=` *Exp* + **|** *\[Name\]* `:` *Exp* `=` *Exp* + *TopDef* **->** `cat` *\[CatDef\]* + **|** `fun` *\[FunDef\]* + **|** `data` *\[FunDef\]* + **|** `def` *\[Def\]* + **|** `data` *\[DataDef\]* + **|** `param` *\[ParDef\]* + **|** `oper` *\[Def\]* + **|** `lincat` *\[PrintDef\]* + **|** `lindef` *\[Def\]* + **|** `linref` *\[Def\]* + **|** `lin` *\[Def\]* + **|** `printname` `cat` *\[PrintDef\]* + **|** `printname` `fun` *\[PrintDef\]* + **|** `flags` *\[FlagDef\]* + *CatDef* **->** *Ident* *\[DDecl\]* + **|** `[` *Ident* *\[DDecl\]* `]` + **|** `[` *Ident* *\[DDecl\]* `]` `{` *Integer* `}` + *FunDef* **->** *\[Ident\]* `:` *Exp* + *DataDef* **->** *Ident* `=` *\[DataConstr\]* + *DataConstr* **->** *Ident* + **|** *Ident* `.` *Ident* + *\[DataConstr\]* **->** **eps** + **|** *DataConstr* + **|** *DataConstr* `|` *\[DataConstr\]* + *ParDef* **->** *Ident* `=` *\[ParConstr\]* + **|** *Ident* `=` `(` `in` *Ident* `)` + **|** *Ident* + *ParConstr* **->** *Ident* *\[DDecl\]* + *PrintDef* **->** *\[Name\]* `=` *Exp* + *FlagDef* **->** *Ident* `=` *Ident* + *\[Def\]* **->** *Def* `;` + **|** *Def* `;` *\[Def\]* + *\[CatDef\]* **->** *CatDef* `;` + **|** *CatDef* `;` *\[CatDef\]* + *\[FunDef\]* **->** *FunDef* `;` + **|** *FunDef* `;` *\[FunDef\]* + *\[DataDef\]* **->** *DataDef* `;` + **|** *DataDef* `;` *\[DataDef\]* + *\[ParDef\]* **->** *ParDef* `;` + **|** *ParDef* `;` *\[ParDef\]* + *\[PrintDef\]* **->** *PrintDef* `;` + **|** *PrintDef* `;` *\[PrintDef\]* + *\[FlagDef\]* **->** *FlagDef* `;` + **|** *FlagDef* `;` *\[FlagDef\]* + *\[ParConstr\]* **->** **eps** + **|** *ParConstr* + **|** *ParConstr* `|` *\[ParConstr\]* + *\[Ident\]* **->** *Ident* + **|** *Ident* `,` *\[Ident\]* + *Name* **->** *Ident* + **|** `[` *Ident* `]` + *\[Name\]* **->** *Name* + **|** *Name* `,` *\[Name\]* + *LocDef* **->** *\[Ident\]* `:` *Exp* + **|** *\[Ident\]* `=` *Exp* + **|** *\[Ident\]* `:` *Exp* `=` *Exp* + *\[LocDef\]* **->** **eps** + **|** *LocDef* + **|** *LocDef* `;` *\[LocDef\]* + *Exp6* **->** *Ident* + **|** *Sort* + **|** *String* + **|** *Integer* + **|** *Double* + **|** `?` + **|** `[` `]` + **|** `data` + **|** `[` *Ident* *Exps* `]` + **|** `[` *String* `]` + **|** `{` *\[LocDef\]* `}` + **|** `<` *\[TupleComp\]* `>` + **|** `<` *Exp* `:` *Exp* `>` + **|** `(` *Exp* `)` + *Exp5* **->** *Exp5* `.` *Label* + **|** *Exp6* + *Exp4* **->** *Exp4* *Exp5* + **|** `table` `{` *\[Case\]* `}` + **|** `table` *Exp6* `{` *\[Case\]* `}` + **|** `table` *Exp6* `[` *\[Exp\]* `]` + **|** `case` *Exp* `of` `{` *\[Case\]* `}` + **|** `variants` `{` *\[Exp\]* `}` + **|** `pre` `{` *Exp* `;` *\[Altern\]* `}` + **|** `strs` `{` *\[Exp\]* `}` + **|** *Ident* `@` *Exp6* + **|** *Exp5* + *Exp3* **->** *Exp3* `!` *Exp4* + **|** *Exp3* `*` *Exp4* + **|** *Exp3* `**` *Exp4* + **|** *Exp4* + *Exp1* **->** *Exp2* `+` *Exp1* + **|** *Exp2* + *Exp* **->** *Exp1* `++` *Exp* + **|** `\` *\[Bind\]* `->` *Exp* + **|** `\` `\` *\[Bind\]* `=>` *Exp* + **|** *Decl* `->` *Exp* + **|** *Exp3* `=>` *Exp* + **|** `let` `{` *\[LocDef\]* `}` `in` *Exp* + **|** `let` *\[LocDef\]* `in` *Exp* + **|** *Exp3* `where` `{` *\[LocDef\]* `}` + **|** `in` *Exp5* *String* + **|** *Exp1* + *Exp2* **->** *Exp3* + *\[Exp\]* **->** **eps** + **|** *Exp* + **|** *Exp* `;` *\[Exp\]* + *Exps* **->** **eps** + **|** *Exp6* *Exps* + *Patt2* **->** `_` + **|** *Ident* + **|** *Ident* `.` *Ident* + **|** *Integer* + **|** *Double* + **|** *String* + **|** `{` *\[PattAss\]* `}` + **|** `<` *\[PattTupleComp\]* `>` + **|** `(` *Patt* `)` + *Patt1* **->** *Ident* *\[Patt\]* + **|** *Ident* `.` *Ident* *\[Patt\]* + **|** *Patt2* `*` + **|** *Ident* `@` *Patt2* + **|** `-` *Patt2* + **|** *Patt2* + *Patt* **->** *Patt* `|` *Patt1* + **|** *Patt* `+` *Patt1* + **|** *Patt1* + *PattAss* **->** *\[Ident\]* `=` *Patt* + *Label* **->** *Ident* + **|** `$` *Integer* + *Sort* **->** `Type` + **|** `PType` + **|** `Str` + **|** `Strs` + *\[PattAss\]* **->** **eps** + **|** *PattAss* + **|** *PattAss* `;` *\[PattAss\]* + *\[Patt\]* **->** *Patt2* + **|** *Patt2* *\[Patt\]* + *Bind* **->** *Ident* + **|** `_` + *\[Bind\]* **->** **eps** + **|** *Bind* + **|** *Bind* `,` *\[Bind\]* + *Decl* **->** `(` *\[Bind\]* `:` *Exp* `)` + **|** *Exp4* + *TupleComp* **->** *Exp* + *PattTupleComp* **->** *Patt* + *\[TupleComp\]* **->** **eps** + **|** *TupleComp* + **|** *TupleComp* `,` *\[TupleComp\]* + *\[PattTupleComp\]* **->** **eps** + **|** *PattTupleComp* + **|** *PattTupleComp* `,` *\[PattTupleComp\]* + *Case* **->** *Patt* `=>` *Exp* + *\[Case\]* **->** *Case* + **|** *Case* `;` *\[Case\]* + *Altern* **->** *Exp* `/` *Exp* + *\[Altern\]* **->** **eps** + **|** *Altern* + **|** *Altern* `;` *\[Altern\]* + *DDecl* **->** `(` *\[Bind\]* `:` *Exp* `)` + **|** *Exp6* + *\[DDecl\]* **->** **eps** + **|** *DDecl* *\[DDecl\]* + --------------------- -------- ------------------------------------------------------------------------------------