mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-09 13:09:33 -06:00
2788 lines
107 KiB
Markdown
2788 lines
107 KiB
Markdown
---
|
||
title: GF Language Reference Manual
|
||
author: Aarne Ranta, Krasimir Angelov
|
||
date: June 2014, GF 3.6
|
||
show-toc: true
|
||
---
|
||
|
||
This document is a reference manual to the GF programming language. GF,
|
||
Grammatical Framework, is a special-purpose programming language,
|
||
designed to support definitions of grammars.
|
||
|
||
This document is not an introduction to GF; such introduction can be
|
||
found in the GF tutorial available on line on the GF web page,
|
||
|
||
[`grammaticalframework.org`](http://grammaticalframework.org)
|
||
|
||
This manual covers only the language, not the GF compiler or interactive
|
||
system. We will however make some references to different compiler
|
||
versions, if they involve changes of behaviour having to do with the
|
||
language specification.
|
||
|
||
This manual is meant to be fully compatible with GF version 3.0. Main
|
||
discrepancies with version 2.8 are indicated, as well as with the
|
||
reference article on GF,
|
||
|
||
A. Ranta, \"Grammatical Framework. A Type Theoretical Grammar
|
||
Formalism\", *The Journal of Functional Programming* 14(2), 2004, pp.
|
||
145-189.
|
||
|
||
This article will referred to as \"the JFP article\".
|
||
|
||
As metalinguistic notation, we will use the symbols
|
||
|
||
- *a* === *b* to say that *a* is syntactic sugar for *b*
|
||
- *a* ==\> *b* to say that *a* is computed (or compiled) to *b*
|
||
|
||
|
||
Overview of GF
|
||
--------------
|
||
|
||
GF is a typed functional language, borrowing many of its constructs from
|
||
ML and Haskell: algebraic datatypes, higher-order functions, pattern
|
||
matching. The module system bears resemblance to ML (functors) but also
|
||
to object-oriented languages (inheritance). The type theory used in the
|
||
abstract syntax part of GF is inherited from logical frameworks, in
|
||
particular ALF (\"Another Logical Framework\"; in a sense, GF is Yet
|
||
Another ALF). From ALF comes also the use of dependent types, including
|
||
the use of explicit type variables instead of Hindley-Milner
|
||
polymorphism.
|
||
|
||
The look and feel of GF is close to Java and C, due to the use of curly
|
||
brackets and semicolons in structuring the code; the expression syntax,
|
||
however, follows Haskell in using juxtaposition for function application
|
||
and parentheses only for grouping.
|
||
|
||
To understand the constructs of GF, and especially their limitations in
|
||
comparison to general-purpose programming languages, it is essential to
|
||
keep in mind that GF is a special-purpose and non-turing-complete
|
||
language. Every GF program is ultimately compiled to a **multilingual
|
||
grammar**, which consists of an **abstract syntax** and a set of
|
||
**concrete syntaxes**. The abstract syntax defines a system of **syntax
|
||
trees**, and each concrete syntax defines a mapping from those syntax
|
||
trees to **nested tuples** of strings and integers. This mapping is
|
||
**compositional**, i.e. **homomorphic**, and moreover **reversible**:
|
||
given a nested tuple, there exists an effective way of finding the set
|
||
of syntax trees that map to this tuple. The procedure of applying the
|
||
mapping to a tree to produce a tuple is called **linearization**, and
|
||
the reverse search procedure is called **parsing**. It is ultimately the
|
||
requirement of reversibility that restricts GF to be less than
|
||
turing-complete. This is reflected in restrictions to recursion in
|
||
concrete syntax. Tree formation in abstract syntax, however, is fully
|
||
recursive.
|
||
|
||
Even though run-time GF grammars manipulate just nested tuples, at
|
||
compile time these are represented by the more fine-grained labelled
|
||
records and finite functions over algebraic datatypes. This enables the
|
||
programmer to write on a higher abstraction level, and also adds type
|
||
distinctions and hence raises the level of checking of programs.
|
||
|
||
|
||
The module system
|
||
-----------------
|
||
|
||
|
||
### Top-level and supplementary module structure
|
||
|
||
The big picture of GF as a programming language for multilingual
|
||
grammars explains its principal module structure. Any GF grammar must
|
||
have an abstract syntax module; it can in addition have any number of
|
||
concrete syntax modules matching that abstract syntax. Before going to
|
||
details, we give a simple example: a module defining the **category**
|
||
`A` of adjectives and one adjective-forming **function**, the zero-place
|
||
function `Even`. We give the module the name `Adj`. The GF code for the
|
||
module looks as follows:
|
||
|
||
abstract Adj = {
|
||
cat A ;
|
||
fun Even : A ;
|
||
}
|
||
|
||
Here are two concrete syntax modules, one intended for mapping the trees
|
||
to English, the other to Swedish. The mapping is defined by `lincat`
|
||
definitions assigning a **linearization type** to each category, and
|
||
`lin` definitions assigning a **linearization** to each function.
|
||
|
||
concrete AdjEng of Adj = {
|
||
lincat A = {s : Str} ;
|
||
lin Even = {s = "even"} ;
|
||
}
|
||
|
||
concrete AdjSwe of Adj = {
|
||
lincat A = {s : AForm => Str} ;
|
||
lin Even = {s = table {
|
||
ASg Utr => "jämn" ;
|
||
ASg Neutr => "jämnt" ;
|
||
APl => "jämna"
|
||
}
|
||
} ;
|
||
param AForm = ASg Gender | APl ;
|
||
param Gender = Utr | Neutr ;
|
||
}
|
||
|
||
These examples illustrate the main ideas of multilingual grammars:
|
||
|
||
- the concrete syntax must match the abstract syntax:
|
||
- every `cat` is given a `lincat`
|
||
- every `fun` is given a `lin`
|
||
|
||
<!-- -->
|
||
|
||
- the concrete syntax is internally coherent:
|
||
- the `lin` rules respect the types defined by `lincat` rules
|
||
|
||
<!-- -->
|
||
|
||
- concrete syntaxes are independent of each other
|
||
- they can use different `lincat` and `lin` definitions
|
||
- they can define their own **parameter types** (`param`)
|
||
|
||
The first two ideas form the core of the **static checking** of GF
|
||
grammars, eliminating the possibility of run-time errors in
|
||
linearization and parsing. The third idea gives GF the expressive power
|
||
needed to map abstract syntax to vastly different languages.
|
||
|
||
Abstract and concrete modules are called **top-level grammar modules**,
|
||
since they are the ones that remain in grammar systems at run time.
|
||
However, in order to support **modular grammar engineering**, GF
|
||
provides much more module structure than strictly required in top-level
|
||
grammars.
|
||
|
||
**Inheritance**, also known as **extension**, means that a module can
|
||
inherit the contents of one or more other modules to which new
|
||
judgements are added, e.g.
|
||
|
||
abstract MoreAdj = Adj ** {
|
||
fun Odd : A ;
|
||
}
|
||
|
||
**Resource modules** define parameter types and **operations** usable in
|
||
several concrete syntaxes,
|
||
|
||
resource MorphoFre = {
|
||
param Number = Sg | Pl ;
|
||
param Gender = Masc | Fem ;
|
||
oper regA : Str -> {s : Gender => Number => Str} =
|
||
\fin -> {
|
||
s = table {
|
||
Masc => table {Sg => fin ; Pl => fin + "s"} ;
|
||
Fem => table {Sg => fin + "e" ; Pl => fin + "es"}
|
||
}
|
||
} ;
|
||
}
|
||
|
||
By **opening**, a module can use the contents of a resource module
|
||
without inheriting them, e.g.
|
||
|
||
concrete AdjFre of Adj = open MorphoFre in {
|
||
lincat A = {s : Gender => Number => Str} ;
|
||
lin Even = regA "pair" ;
|
||
}
|
||
|
||
**Interfaces** and **instances** separate the contents of a resource
|
||
module to type signatures and definitions, in a way analogous to
|
||
abstract vs. concrete modules, e.g.
|
||
|
||
interface Lexicon = {
|
||
oper Adjective : Type ;
|
||
oper even_A : Adjective ;
|
||
}
|
||
|
||
instance LexiconEng of Lexicon = {
|
||
oper Adjective = {s : Str} ;
|
||
oper even_A = {s = "even"} ;
|
||
}
|
||
|
||
**Functors** i.e. **parametrized modules** i.e. **incomplete modules**,
|
||
defining a concrete syntax in terms of an interface.
|
||
|
||
incomplete concrete AdjI of Adj = open Lexicon in {
|
||
lincat A = Adjective ;
|
||
lin Even = even_A ;
|
||
}
|
||
|
||
A functor can be **instantiated** by providing instances of its open
|
||
interfaces.
|
||
|
||
concrete AdjEng of Adj = AdjI with (Lexicon = LexiconEng) ;
|
||
|
||
|
||
### Compilation units
|
||
|
||
The compilation unit of GF source code is a file that contains a module.
|
||
Judgements outside modules are supported only for backward
|
||
compatibility, as explained [here](#oldgf). Every source file, suffixed
|
||
`.gf`, is compiled to a \"GF object file\", suffixed `.gfo` (as of GF
|
||
Version 3.0 and later). For runtime grammar objects used for parsing and
|
||
linearization, a set of `.gfo` files is linked to a single file suffixed
|
||
`.pgf`. While `.gf` and `.gfo` files may contain modules of any kinds, a
|
||
`.pgf` file always contains a multilingual grammar with one abstract and
|
||
a set of concrete syntaxes.
|
||
|
||
The following diagram summarizes the files involved in the compilation
|
||
process.
|
||
|
||
`module1.gf module2.gf ... modulen.gf`
|
||
|
||
==\>
|
||
|
||
`module1.gfo module2.gfo ... modulen.gfo`
|
||
|
||
==\>
|
||
|
||
grammar.pgf
|
||
|
||
Both `.gf` and `.gfo` files are written in the GF source language;
|
||
`.pgf` files are written in a lower-level format. The process of
|
||
translating `.gf` to `.gfo` consists of **name resolution**, **type
|
||
annotation**, **partial evaluation**, and **optimization**. There is a
|
||
great advantage in the possibility to do this separately for GF modules
|
||
and saving the result in `.gfo` files. The partial evaluation phase, in
|
||
particular, is time and memory consuming, and GF libraries are therefore
|
||
distributed in `.gfo` to make their use less arduous.
|
||
|
||
*In GF before version 3.0, the object files are in a format called
|
||
`.gfc`,* *and the multilingual runtime grammar is in a format called
|
||
`.gfcm`.*
|
||
|
||
The standard compiler has a built-in **make facility**, which finds out
|
||
what other modules are needed when compiling an explicitly given module.
|
||
This facility builds a dependency graph and decides which of the
|
||
involved modules need recompilation (from `.gf` to `.gfo`), and for
|
||
which the GF object can be used directly.
|
||
|
||
|
||
### Names
|
||
|
||
Each module *M* defines a set of **names**, which are visible in *M*
|
||
itself, in all modules extending *M* (unless excluded, as explained
|
||
[here](#restrictedinheritance)), and all modules opening *M*. These
|
||
names can stand for abstract syntax categories and functions, parameter
|
||
types and parameter constructors, and operations. All these names live
|
||
in the same **name space**, which means that a name entering a module
|
||
more than once due to inheritance or opening can lead to a **conflict**.
|
||
It is specified [here](#renaming) how these conflicts are resolved.
|
||
|
||
The names of modules live in a name space separate from the other names.
|
||
Even here, all names must be distinct in a set of files compiled to a
|
||
multilingual grammar. In particular, even files residing in different
|
||
directories must have different names, since GF has no notion of
|
||
hierarchic module names.
|
||
|
||
Lexically, names belong to the class of **identifiers**. An idenfifier
|
||
is a letter followed by any number of letters, digits, undercores (`_`)
|
||
and primes (`'`). Upper- and lower-case letters are treated as distinct.
|
||
Nothing dictates the choice of upper or lower-case initials, but the
|
||
standard libraries follow conventions similar to Haskell:
|
||
|
||
- upper case is used for modules, abstract syntax categories and
|
||
functions, parameter types and constructors, and type synonyms
|
||
- lower case is used for non-type-valued operations and for variables
|
||
|
||
[]{#identifiers}
|
||
|
||
\"Letters\" as mentioned in the identifier syntax include all 7-bit
|
||
ASCII letters. Iso-latin-1 and Unicode letters are supported in varying
|
||
degrees by different tools and platforms, and are hence not recommended
|
||
in identifiers.
|
||
|
||
|
||
### The structure of a module
|
||
|
||
Modules of all types have the following structure:
|
||
|
||
*moduletype* *name* `=` *extends* *opens* *body*
|
||
|
||
The part of the module preceding the body is its **header**. The header
|
||
defines the type of the module and tells what other modules it inherits
|
||
and opens. The body consists of the judgements that introduce all the
|
||
new names defined by the module.
|
||
|
||
Any of the parts *extends*, *opens*, and *body* may be empty. If they
|
||
are all filled, delimiters and keywords separate the parts in the
|
||
following way:
|
||
|
||
*moduletype* *name* `=` *extends* `**` `open` *opens* `in` `{` *body*
|
||
`}`
|
||
|
||
The part *moduletype* *name* looks slightly different if the type is
|
||
`concrete` or `instance`: the *name* intrudes between the type keyword
|
||
and the name of the module being implemented and which really belongs to
|
||
the type of the module:
|
||
|
||
`concrete` *name* `of` *abstractname*
|
||
|
||
The only exception to the schema of functor syntax is functor
|
||
instantiations: the instantiation list is given in a special way between
|
||
*extends* and *opens*:
|
||
|
||
`incomplete concrete` *name* `of` *abstractname* `=` *extends* `**`
|
||
*functorname* `with` *instantiations* `**` `open` *opens* `in` `{`
|
||
*body* `}`
|
||
|
||
Logically, the part \"*functorname* `with` *instantiations*\" should
|
||
really be one of the *extends*. This is also shown by the fact that it
|
||
can have restricted inheritance (concept defined
|
||
[here](#restrictedinheritance)).
|
||
|
||
|
||
### Module types, headers, and bodies
|
||
|
||
The *extends* and *opens* parts of a module header are lists of module
|
||
names (with possible qualifications, as defined below
|
||
[here](#qualifiednames)). The first step of type checking a module
|
||
consists of verifying that these names stand for modules of approptiate
|
||
module types. As a rule of thumb,
|
||
|
||
- the *extends* of a module must have the same *moduletype*
|
||
- the *opens* of a module must be of type `resource`
|
||
|
||
However, the precise rules are a little more fine-grained, because of
|
||
the presence of interfaces and their instances, and the possibility to
|
||
reuse abstract and concrete modules as resources. The following table
|
||
gives, for all module types, the possible module types of their
|
||
*extends* and *opens*, as well as the forms of judgement legal in that
|
||
module type.
|
||
|
||
module type extends opens body
|
||
--------------------------- ------------ ------------ ----------------------------
|
||
`abstract` abstract \- `cat, fun, def, data`
|
||
`concrete of` *abstract* concrete resource\* `lincat, cat, oper, param`
|
||
`resource` resource\* resource\* `oper, param`
|
||
`interface` resource+ resource\* `oper, param`
|
||
`instance of` *interface* resource\* resource\* `oper, param`
|
||
`incomplete` concrete concrete+ resource+ `lincat, cat, oper, param`
|
||
|
||
The table uses the following shorthands for lists of module types:
|
||
|
||
- resource\*: resource, instance, concrete
|
||
- resource+: resource\*, interface, abstract
|
||
- concrete+: concrete, incomplete concrete
|
||
|
||
The legality of judgements in the body is checked before the judgements
|
||
themselves are checked.
|
||
|
||
The forms of judgement are explained [here](#judgementforms).
|
||
|
||
|
||
### Digression: the logic of module types
|
||
|
||
Why are the legality conditions of opens and extends so complicated? The
|
||
best way to grasp them is probably to consider a simplified logical
|
||
model of the module system, replacing modules by types and functions.
|
||
This model could actually be developed towards treating modules in GF as
|
||
first-class objects; so far, however, this step has not been motivated
|
||
by any practical needs.
|
||
|
||
module object and type
|
||
------------------------------------------ -----------------------
|
||
abstract A = B A = B : type
|
||
concrete C of A = B C = B : A -\> S
|
||
interface I = B I = B : type
|
||
instance J of I = B J = B : I
|
||
incomplete concrete C of A = open I in B C = B : I -\> A -\> S
|
||
concrete K of A = C with (I=J) K = B(J) : A -\> S
|
||
resource R = B R = B : I
|
||
concrete C of A = open R in B C = B(R) : A -\> S
|
||
|
||
A further step of defining modules as first-class objects would use
|
||
GADTs and record types:
|
||
|
||
- an abstract syntax is a Generalized Algebraic Datatype (GADT)
|
||
- the target type `S` of concrete syntax is the type of nested tuples
|
||
over strings and integers
|
||
- an interface is a labelled record type
|
||
- an instance is a record of the type defined by the interface
|
||
- a functor, with a module body opening an interface, is a function on
|
||
its instances
|
||
- the instantiation of a functor is an application of the function to
|
||
some instance
|
||
- a resource is a typed labelled record, putting together an interface
|
||
and an instance of it
|
||
- the body of a module opening a resource is as a function on the
|
||
interface implicit in the resource; this function is immediately
|
||
applied to the instance defined in the resource
|
||
|
||
Slightly unexpectedly, interfaces and instances are easier to understand
|
||
in this way than resources - a resource is, indeed, more complex, since
|
||
it fuses together an interface and an instance.
|
||
|
||
[]{#openabstract}
|
||
|
||
When an abstract is used as an interface and a concrete as its instance,
|
||
they are actually reinterpreted so that they match the model. Then the
|
||
abstract is no longer a GADT, but a system of *abstract* datatypes, with
|
||
a record field of type `Type` for each category, and a function among
|
||
these types for each abstract syntax function. A concrete syntax
|
||
instantiates this record with linearization types and linearizations.
|
||
|
||
|
||
### Inheritance
|
||
|
||
After checking that the *extends* of a module are of appropriate module
|
||
types, the compiler adds the inherited judgements to the judgements
|
||
included in the body. The inherited judgements are not copied entirely,
|
||
but their names with links to the inherited module. Conflicts may arise
|
||
in this process: a name can have two definitions in the combined pool of
|
||
inherited and added judgements. Such a conflict is always an error: GF
|
||
provides no way to redefine an inherited constant.
|
||
|
||
Simple as the definition of a conflict may sound, it has to take care of
|
||
the inheritance hierarchy. A very common pattern of inheritance is the
|
||
**diamond**: inheritance from two modules which themselves inherit a
|
||
common base module. Assume that the base module defines a name `f`:
|
||
|
||
N
|
||
/ \
|
||
M1 M2
|
||
\ /
|
||
Base {f}
|
||
|
||
Now, `N` inherits `f` from both `M1` and `M2`, so is there a conflict?
|
||
The answer in GF is *no*, because the \"two\" `f`\'s are in the end the
|
||
same: the one defined in `Base`. The situation is thus simpler than in
|
||
**multiple inheritance** in languages like C++, because definitions in
|
||
GF are **immutable**: neither `M1` nor `M2` can possibly have changed
|
||
the definition of `f` given in `Base`. In practice, the compiler manages
|
||
inheritance through hierarchy in a very simple way, by just always
|
||
creating a link not to the immediate parent, but the original ancestor;
|
||
this ancestor can be read from the link provided by the immediate
|
||
parent. Here is how links are created from source modules by the
|
||
compiler:
|
||
|
||
Base {f}
|
||
M1 {m1} ===> M1 {Base.f, m1}
|
||
M2 {m2} ===> M2 {Base.f, m2}
|
||
N {n} ===> N {Base.f, M1.m1, M2.m2, n}
|
||
|
||
[]{#restrictedinheritance}
|
||
|
||
Inheritance can be **restricted**. This means that a module can be
|
||
specified as inheriting *only* explicitly listed constants, or all
|
||
constants *except* ones explicitly listed. The syntax uses constant
|
||
names in brackets, prefixed by a minus sign in the case of an exclusion
|
||
list. In the following configuration, N inherits `a,b,c` from `M1`, and
|
||
all names but `d` from `M2`
|
||
|
||
N = M1 {a,b,c}, M2-{d}
|
||
|
||
Restrictions are performed as a part of inheritance linking, module by
|
||
module: the link is created for a constant if and only if it is both
|
||
included in the module and compatible with the restriction. Thus, for
|
||
instance, an inadvertent usage can exclude a constant from one module
|
||
but inherit it from another one. In the following configuration, `f` is
|
||
inherited via `M1`, if `M1` inherits it.
|
||
|
||
N = M1 [a,b,c], M2-[f]
|
||
|
||
Unintended inheritance may cause problems later in compilation, in the
|
||
judgement-level dependency analysis phase. For instance, suppose a
|
||
function `f` has category `C` as its type in `M`, and we only include
|
||
`f`. The exclusion has the effect of creating an ill-formed module:
|
||
|
||
abstract M = {cat C ; fun f : C ;}
|
||
M [f] ===> {fun f : C ;}
|
||
|
||
One might expect inheritance restriction to be transitive: if an
|
||
included constant *b* depends on some other constant *a*, then *a*
|
||
should be included automatically. However, this rule would leave to
|
||
hard-to-detect inheritances. And it could only be applied later in the
|
||
compilation phase, when the compiler has not only collected the names
|
||
defined, but also resolved the names used in definitions.
|
||
|
||
Yet another pitfall with restricted inheritance is that it must be
|
||
stated for each module separately. For instance, a concrete syntax of an
|
||
abstract must exclude all those names that the abstract does, and a
|
||
functor instantiation must replicate all restrictions of the functor.
|
||
|
||
|
||
### Opening
|
||
|
||
Opening makes constants from other modules usable in judgements, without
|
||
inheriting them. This means that, unlike inheritance, opening is not
|
||
transitive.
|
||
|
||
[]{#qualifiednames}
|
||
|
||
Opening cannot be restricted as inheritance can, but it can be
|
||
**qualified**. This means that the names from the opened modules cannot
|
||
be used as such, but only as prefixed by a qualifier and a dot (`.`).
|
||
The qualifier can be any identifier, including the name of the module.
|
||
Here is an example of an *opens* list:
|
||
|
||
open A, (X = XSLTS), (Y = XSLTS), B
|
||
|
||
If `A` defines the constant `a`, it can be accessed by the names
|
||
|
||
a A.a
|
||
|
||
If `XSLTS` defines the constant `x`, it can be accessed by the names
|
||
|
||
X.x Y.x XSLTS.x
|
||
|
||
Thus qualification by real module name is always possible, and one and
|
||
the same module can be qualified in different ways at the same time (the
|
||
latter can be useful if you want to be able to change the
|
||
implementations of some constants to a different resource later). Since
|
||
the qualification with real module name is always possible, it is not
|
||
possible to \"swap\" the names of modules locally:
|
||
|
||
open (A=B), (B=A) -- NOT POSSIBLE!
|
||
|
||
The list of qualifiers names and module names in a module header may
|
||
thus not contain any duplicates.
|
||
|
||
|
||
### Name resolution
|
||
|
||
[]{#renaming}
|
||
|
||
**Name resolution** is the compiler phase taking place after inheritance
|
||
linking. It qualifies all names occurring in the definition parts of
|
||
judgements (that is, just excluding the defined names themselves) with
|
||
the names of the modules they come from. If a name can come from
|
||
different modules (that is, not from their common ancestor), a conflict
|
||
is reported; this decision is hence not dependent on e.g. types, which
|
||
are known only at a later phase.
|
||
|
||
Qualification of names is the main device for avoiding conflicts in name
|
||
resolution. No other information is used, such as priorities between
|
||
modules. However, if a name is defined in different opened modules but
|
||
never used in the module body, a conflict does not arise: conflicts
|
||
arise only when names are used. Also in this respect, opening is thus
|
||
different from inheritance, where conflicts are checked independently of
|
||
use.
|
||
|
||
As usual, inner scope has priority in name resolution. This means that
|
||
if an identifier is in scope as a bound variable, it will not be
|
||
interpreted as a constant, unless qualified by a module name (variable
|
||
bindings are explained [here](#variablebinding)).
|
||
|
||
|
||
### Functor instantiations
|
||
|
||
We have dealt with the principles of module headers, inheritance, and
|
||
names in a general way that applies to all module types. The exception
|
||
is functor instantiations, that have an extra part of the instantiating
|
||
equations, assigning an instance to every interface. Here is a typical
|
||
example, displaying the full generality:
|
||
|
||
concrete FoodsEng of Foods = PhrasesEng **
|
||
FoodsI-[Pizza] with
|
||
(Syntax = SyntaxEng),
|
||
(LexFoods = LexFoodsEng) **
|
||
open SyntaxEng, ParadigmsEng in {
|
||
lin Pizza = mkCN (mkA "Italian") (mkN "pie") ;
|
||
}
|
||
|
||
(The example is modified from Section 5.9 in the GF Tutorial.)
|
||
|
||
The instantiation syntax is similar to qualified *opens*. The
|
||
left-hand-side names must be interfaces, the right-hand-side names their
|
||
instances. (Recall that `abstract` can be use as `interface` and
|
||
`concrete` as its `instance`.) Inheritance from the functor can be
|
||
restricted, typically in the purpose of defining some excluded functions
|
||
in language-specific ways in the module body.
|
||
|
||
|
||
### Completeness
|
||
|
||
(This section refers to the forms of judgement introduced
|
||
[here](#judgementforms).)
|
||
|
||
A `concrete` is complete with respect to an `abstract`, if it contains a
|
||
`lincat` definition for every `cat` declaration, and a `lin` definition
|
||
for every `fun` declaration.
|
||
|
||
The same completeness criterion applies to functor instantiations. It is
|
||
not possible to use a partial functor instantiation, leading to another
|
||
functor.
|
||
|
||
Functors do not need to be complete in the sense concrete modules need.
|
||
The missing definitions can then be provided in the body of each functor
|
||
instantiation.
|
||
|
||
A `resource` is complete, if all its `oper` and `param` judgements have
|
||
a definition part. While a `resource` must be complete, an `interface`
|
||
need not. For an `interface`, it is the definition parts of judgements
|
||
are optional.
|
||
|
||
An `instance` is complete with respect to an `interface`, if it gives
|
||
the definition parts of all `oper` and `param` judgements that are
|
||
omitted in the `interface`. Giving definitions to judgements that have
|
||
already been defined in the `interface` is illegal. Type signatures, on
|
||
the other hand, can be repeated if the same types are used.
|
||
|
||
In addition to completing the definitions in an `interface`, its
|
||
instance may contain other judgements, but these must all be complete
|
||
with definitions.
|
||
|
||
Here is an example of an instance and its interface showing the above
|
||
variations:
|
||
|
||
interface Pos = {
|
||
param Case ; -- no definition
|
||
param Number = Sg | Pl ; -- definition given
|
||
oper Noun : Type = { -- relative definition given
|
||
s : Number => Case => Str
|
||
} ;
|
||
oper regNoun : Str -> Noun ; -- no definition
|
||
}
|
||
|
||
instance PosEng of Pos = {
|
||
param Case = Nom | Gen ; -- definition of Case
|
||
-- Number and Noun inherited
|
||
oper regNoun = \dog -> { -- type of regNoun inherited
|
||
s = table { -- definition of regNoun
|
||
Sg => table {
|
||
Nom => dog
|
||
-- etc
|
||
}
|
||
} ;
|
||
oper house_N : Noun = -- new definition
|
||
regNoun "house" ;
|
||
}
|
||
|
||
|
||
Judgements
|
||
----------
|
||
|
||
|
||
### Overview of the forms of judgement
|
||
|
||
[]{#judgementforms}
|
||
|
||
A module body in GF is a set of **judgements**. Judgements are
|
||
definitions or declarations, sometimes combinations of the two; the
|
||
common feature is that every judgement introduces a name, which is
|
||
available in the module and whenever the module is extended or opened.
|
||
|
||
There are several different **forms of judgement**, identified by
|
||
different **judgement keywords**. Here is a list of all these forms,
|
||
together with syntax descriptions and the types of modules in which each
|
||
form can occur. The table moreover indicates whether the judgement has a
|
||
default value, and whether it contributes to the **name base**, i.e.
|
||
introduces a new name to the scope.
|
||
|
||
judgement where module default base
|
||
----------------------------- ---------------------------- ------------ --------- ------
|
||
`cat` C G G context abstract N/A yes
|
||
`fun` f : A A type abstract N/A yes
|
||
`def` f ps = t f fun, ps patterns, t term abstract yes no
|
||
`data` C = f `|` \... `|` g C cat, f\...g fun abstract yes no
|
||
`lincat` C = T C cat, T type concrete\* yes yes
|
||
`lin` f = t f fun, t term concrete\* no yes
|
||
`lindef` C = t C cat, t term concrete\* yes no
|
||
`linref` C = t C cat, t term concrete\* yes no
|
||
`printname cat` C = t C cat, t term concrete\* yes no
|
||
`printname fun` f = t f fun, t term concrete\* yes no
|
||
`param` P = C`|` \... `|` D C\...D constructors resource\* N/A yes
|
||
`oper` f : T = t T type, t term resource\* N/A yes
|
||
`flags` o = v o flag, v value all yes N/A
|
||
|
||
Judgements that have default values are rarely used, except `lincat` and
|
||
`flags`, which often need values different from the defaults.
|
||
|
||
Introducing a name twice in the same module is an error. In other words,
|
||
all judgements that have a \"yes\" in the name base column, must have
|
||
distinct identifiers on their left-hand sides.
|
||
|
||
All judgement end with semicolons (`;`).
|
||
|
||
In addition to the syntax given in the table, many of the forms have
|
||
syntactic sugar. This sugar will be explained below in connection to
|
||
each form. There are moreover two kinds of syntactic sugar common to all
|
||
forms:
|
||
|
||
- the judgement keyword is shared between consecutive judgements until
|
||
a new keyword appears:
|
||
`keyw J ; K ;` === `keyw J ; keyw K ;`
|
||
- the right-hand sides of colon (`:`) and equality (`=`) can be
|
||
shared, by using comma (`,`) as separator of left-hand sides, which
|
||
must consist of identifiers
|
||
`c,d : T` === `c : T ; d : T ;`
|
||
`c,d = t` === `c = t ; d = t ;`
|
||
|
||
These conventions, like all syntactic sugar, are performed at an early
|
||
compilation phase, directly after parsing. This means that e.g.
|
||
|
||
lin f,g = \x -> x ;
|
||
|
||
can be correct even though `f` and `g` required different function
|
||
types.
|
||
|
||
Within a module, judgements can occur in any order. In particular, a
|
||
name can be used before it is introduced.
|
||
|
||
The explanations of judgement forms refer to the notions of **type** and
|
||
**term** (the latter also called **expression**). These notions will be
|
||
explained in detail [here](#expressions).
|
||
|
||
|
||
### Category declarations, cat
|
||
|
||
[]{#catjudgements}
|
||
|
||
Category declarations
|
||
|
||
`cat` *C* *G*
|
||
|
||
define the **basic types** of abstract syntax. A basic type is formed
|
||
from a category by giving values to all variables in the **context**
|
||
*G*. If the context is empty, the basic type looks the same as the
|
||
category itself. Otherwise, application syntax is used:
|
||
|
||
*C* *a*~1~\...*a*~n~
|
||
|
||
|
||
### Hypotheses and contexts
|
||
|
||
[]{#contexts}
|
||
|
||
A context is a sequence of **hypotheses**, i.e. variable-type pairs. A
|
||
hypothesis is written
|
||
|
||
`(` *x* `:` *T* `)`
|
||
|
||
and a sequence does not have any separator symbols. As syntactic sugar,
|
||
|
||
- variables can share a type,
|
||
`(` *x,y* `:` *T* `)` === `(` *x* `:` *T* `)` `(` *y* `:` *T* `)`
|
||
- a **wildcard** can be used for a variable not occurring in types
|
||
later in the context,
|
||
`(` `_` `:` *T* `)` === `(` *x* `:` *T* `)`
|
||
- if the variable does not occur later, it can be omitted altogether,
|
||
and parentheses are not used,
|
||
*T* === `(` *x* `:` *T* `)`
|
||
But if *T* is more complex than an identifier, it needs parentheses
|
||
to be separated from the rest of the context.
|
||
|
||
An abstract syntax has **dependent types**, if any of its categories has
|
||
a non-empty context.
|
||
|
||
|
||
### Function declarations, fun
|
||
|
||
Function declarations,
|
||
|
||
`fun` *f* `:` *T*
|
||
|
||
define the **syntactic constructors** of abstract syntax. The type *T*
|
||
of *f* is built built from basic types (formed from categories) by using
|
||
the function type constructor `->`. Thus its form is
|
||
|
||
(*x*~1~ `:` *A*~1~) `->` \... `->` (*x*~n~ `:` *A*~n~) `->` *B*
|
||
|
||
where *Ai* are types, called the **argument types**, and *B* is a basic
|
||
type, called the **value type** of *f*. The **value category** of *f* is
|
||
the category that forms the type *B*.
|
||
|
||
A **syntax tree** is formed from *f* by applying it to a full list of
|
||
arguments, so that the result is of a basic type.
|
||
|
||
A **higher-order function** is one that has a function type as an
|
||
argument. The concrete syntax of GF does not support displaying the
|
||
bound variables of functions of higher than second order, but they are
|
||
legal in abstract syntax.
|
||
|
||
An abstract syntax is **context-free**, if it has neither dependent
|
||
types nor higher-order functions. Grammars with context-free abstract
|
||
syntax are an important subclass of GF, with more limited complexity
|
||
than full GF. Whether the *concrete* syntax is context-free in the sense
|
||
of the Chomsky hierarchy is independent of the context-freeness of the
|
||
abstract syntax.
|
||
|
||
|
||
### Function definitions, def
|
||
|
||
Function definitions,
|
||
|
||
`def` *f* *p*~1~ \... *p*~n~ `=` *t*
|
||
|
||
where *f* is a `fun` function and *p*~i~\# are patterns, impose a
|
||
relation of **definitional equality** on abstract syntax trees. They
|
||
form the basis of **computation**, which is used when comparing whether
|
||
two types are equal; this notion is relevant only if the types are
|
||
dependent. Computation can also be used for the **normalization** of
|
||
syntax trees, which applies even in context-free abstract syntax.
|
||
|
||
The set of `def` definitions for *f* can be scattered around the module
|
||
in which *f* is introduced as a function. The compiler builds the set of
|
||
pattern equations in the order in which the equations appear; this order
|
||
is significant in the case of overlapping patterns. All equations must
|
||
appear in the same module in which *f* itself declared.
|
||
|
||
The syntax of patterns will be specified [here](#patternmatching),
|
||
commonly for abstract and concrete syntax. In abstract syntax,
|
||
**constructor patterns** are those of the form
|
||
|
||
*C* *p*~1~ \... *p*~n~
|
||
|
||
where *C* is declared as `data` for some abstract syntax category (see
|
||
next section). A **variable pattern** is either an identifier or a
|
||
wildcard.
|
||
|
||
A common pitfall is to forget to declare a constructor as data, which
|
||
causes it to be interpreted as a variable pattern in definitions.
|
||
|
||
Computation is performed by applying definitions and beta conversions,
|
||
and in general by using **pattern matching**. Computation and pattern
|
||
matching are explained commonly for abstract and concrete syntax
|
||
[here](#patternmatching).
|
||
|
||
In contrast to concrete syntax, abstract syntax computation is
|
||
completely **symbolic**: it does not produce a value, but just another
|
||
term. Hence it is not an error to have incomplete systems of pattern
|
||
equations for a function. In addition, the definitions can be
|
||
**recursive**, which means that computation can fail to terminate; this
|
||
can never happen in concrete syntax.
|
||
|
||
|
||
### Data constructor definitions, data
|
||
|
||
A data constructor definition,
|
||
|
||
`data` *C* `=` *f*~1~ `|` \... `|` *f*~n~
|
||
|
||
defines the functions *f1*\...*fn* to be **constructors** of the
|
||
category *C*. This means that they are recognized as constructor
|
||
patterns when used in function definitions.
|
||
|
||
In order for the data constructor definition to be correct,
|
||
*f*~1~\...*f*~n~ must be functions with *C* as their value category.
|
||
|
||
The complete set of constructors for a category *C* is the union of all
|
||
its data constructor definitions. Thus a category can be \"extended\" by
|
||
new constructors afterwards. However, all these constructor definitions
|
||
must appear in the same module in which the category is itself defined.
|
||
|
||
There is syntactic sugar for declaring a function as a constructor at
|
||
the same time as introducing it:
|
||
|
||
`data` *f* : *A*~1~ `->` \... `->` *A*~n~ `->` *C* *t*~1~ \... *t*~m~
|
||
|
||
===
|
||
|
||
`fun` *f* : *A*~1~ `->` \... `->` *A*~n~ `->` *C* *t*~1~ \... *t*~m~ ;
|
||
`data` *C* = *f*
|
||
|
||
|
||
### The semantic status of an abstract syntax function
|
||
|
||
There are three possible statuses for a function declared in a `fun`
|
||
judgement:
|
||
|
||
- primitive notion: the default status
|
||
- constructor: the function appears on the right-hand side in `data`
|
||
judgement
|
||
- defined: the function has a `def` definition
|
||
|
||
The \"constructor\" and \"defined\" statuses are in contradiction with
|
||
each other, whereas the primitive notion status is overridden by any of
|
||
the two others.
|
||
|
||
This distinction is relevant for the semantics of abstract syntax, not
|
||
for concrete syntax. It shows in the way patterns are treated in
|
||
equations in `def` definitions: a constructor in a pattern matches only
|
||
itself, whereas any other name is treated as a variable pattern, which
|
||
matches anything.
|
||
|
||
|
||
### Linearization type definitions, lincat
|
||
|
||
A linearization type definition,
|
||
|
||
`lincat` *C* `=` *T*
|
||
|
||
defines the type of linearizations of trees whose type has category *C*.
|
||
Type dependences have no effect on the linearization type.
|
||
|
||
The type *T* must be a **legal linearization type**, which means that it
|
||
is a *record type* whose fields have either parameter types, the type
|
||
Str of strings, or table or record types of these. In particular,
|
||
function types may not appear in *T*. A detailed explanation of types in
|
||
concrete syntax will be given [here](#cnctypes).
|
||
|
||
If *K* is the concrete syntax of an abstract syntax *A*, then *K* must
|
||
define the linearization type of all categories declared in *A*.
|
||
However, the definition can be omitted from the source code, in which
|
||
case the default type `{s : Str}` is used.
|
||
|
||
|
||
### Linearization definitions, lin
|
||
|
||
A linearization definition,
|
||
|
||
`lin` *f* `=` *t*
|
||
|
||
defines the linearizations function of function *f*, i.e. the function
|
||
used for linearizing trees formed by *f*.
|
||
|
||
The type of *t* must be the homomorphic image of the type of *f*. In
|
||
other words, if
|
||
|
||
`fun` *f* `:` *A*~1~ `->` \... `->` *A*~n~ `->` *A*
|
||
|
||
then
|
||
|
||
`lin` *f* `:` *A*~1~\* `->` \... `->` *A*~n~\* `->` *A*\*
|
||
|
||
where the type *T*\* is defined as follows depending on *T*:
|
||
|
||
- (*C* *t*~1~ \... *t*~n~)\* = *T*, if `lincat` *C* `=` *T*
|
||
- (*B*~1~ `->` \... `->` *B*~m~ `->` *B*)\* = *B*\*
|
||
`** {$0,...,$m : Str}`
|
||
|
||
The second case is relevant for higher-order functions only. It says
|
||
that the linearization type of the value type is extended by adding a
|
||
string field for each argument types; these fields store the variable
|
||
symbol used for the binding of each variable.
|
||
|
||
[]{#HOAS}
|
||
|
||
Since the arguments of a function argument are treated as bare strings,
|
||
orders higher than the second are irrelevant for concrete syntax.
|
||
|
||
There is syntactic sugar for binding the variables of the linearization
|
||
of a function on the left-hand side:
|
||
|
||
`lin` *f* *p* `=` *t* === `lin` *f* `= \`*p* `->` *t*
|
||
|
||
The pattern *p* must be either a variable or a wildcard (`_`); this is
|
||
what the syntax of lambda abstracts (`\p -> t`) requires.
|
||
|
||
|
||
### Linearization default definitions, lindef
|
||
|
||
[]{#lindefjudgements}
|
||
|
||
A linearization default definition,
|
||
|
||
`lindef` *C* `=` *t*
|
||
|
||
defines the default linearization of category *C*, i.e. the function
|
||
applicable to a string to make it into an object of the linearization
|
||
type of *C*.
|
||
|
||
Linearization defaults are invoked when linearizing variable bindings in
|
||
higher-order abstract syntax. A variable symbol is then presented as a
|
||
string, which must be converted to correct type in order for the
|
||
linearization not to fail with an error.
|
||
|
||
The other use of the defaults is for linearizing metavariables and
|
||
abstract functions without linearization in the concrete syntax. In the
|
||
first case the default linearization is applied to the string `"?X"`
|
||
where `X` is the unique index of the metavariable, and in the second
|
||
case the string is `"[f]"` where `f` is the name of the abstract
|
||
function with missing linearization.
|
||
|
||
Usually, linearization defaults are generated by using the default rule
|
||
that \"uses the symbol itself for every string, and the first value of
|
||
the parameter type for every parameter\". The precise definition is by
|
||
structural recursion on the type:
|
||
|
||
- default(Str,s) = s
|
||
- default(P,s) = \#1(P)
|
||
- default(P =\> T,s) = `\\_ =>` default(T,s)
|
||
- default(`{`\... ; r : R ; \...`}`,s) = `{`\... ; r : default(R,s) ;
|
||
\...`}`
|
||
|
||
The notion of the first value of a parameter type (\#1(P)) is defined
|
||
[below](#paramvalues).
|
||
|
||
|
||
### Linearization reference definitions, linref
|
||
|
||
[]{#linrefjudgements}
|
||
|
||
A linearization reference definition,
|
||
|
||
`linref` *C* `=` *t*
|
||
|
||
defines the reference linearization of category *C*, i.e. the function
|
||
applicable to an object of the linearization type of *C* to make it into
|
||
a string.
|
||
|
||
The reference linearization is always applied to the top-level node of
|
||
the abstract syntax tree. For example when we linearize the tree
|
||
`f x1 x2 .. xn`, then we first apply `f` to its arguments which gives us
|
||
an object of the linearization type of its category. After that we apply
|
||
the reference linearization for the same category to get a string out of
|
||
the object. This is particularly useful when the linearization type of
|
||
*C* contains discontious constituents. In this case usually the
|
||
reference linearization glues the constituents together to produce an
|
||
intuitive linearization string.
|
||
|
||
The reference linearization is also used for linearizing metavariables
|
||
which stand in function position. For example the tree
|
||
`f (? x1 x2 .. xn)` is linearized as follows. Each of the arguments
|
||
`x1 x2 .. xn` is linearized, and after that the reference linearization
|
||
of the its category is applied to the output of the linearization. The
|
||
result is a sequence of `n` strings which are concatenated into a single
|
||
string. The final string is the input to the default linearization of
|
||
the category for the argument of `f`. After applying the default
|
||
linearization we get an object that we could safely pass to `f`.
|
||
|
||
Usually, linearization references are generated by using the rule that
|
||
\"picks the first string in the linearization type\". The precise
|
||
definition is by structural recursion on the type:
|
||
|
||
- reference(Str,o) = Just o
|
||
- reference(P,s) = Nothing
|
||
- reference(P =\> T,o) = reference(T,o ! \#1(P)) \|\| reference(T,o !
|
||
\#2(P)) \|\| \... \|\| reference(T,o ! \#n(P))
|
||
- reference({r1 : R1; \... rn : Rn},o) = reference(R1, o.r1) \|\|
|
||
reference(R2, o.r2) \|\| \... \|\| reference(Rn, o.rn)
|
||
|
||
Here each call to reference returns either `(Just o)` or `Nothing`. When
|
||
we compute the reference for a table or a record then we pick the
|
||
reference for the first expression for which the recursive call gives us
|
||
`Just`. If we get `Nothing` for all of them then the final result is
|
||
`Nothing` too.
|
||
|
||
|
||
### Printname definitions, printname cat and printname fun
|
||
|
||
A category printname definition,
|
||
|
||
`printname cat` *C* `=` *s*
|
||
|
||
defines the printname of category *C*, i.e. the name used in some
|
||
abstract syntax information shown to the user.
|
||
|
||
Likewise, a function printname definition,
|
||
|
||
`printname fun` *f* `=` *s*
|
||
|
||
defines the printname of function *f*, i.e. the name used in some
|
||
abstract syntax information shown to the user.
|
||
|
||
The most common use of printnames is in the interactive syntax editor,
|
||
where printnames are displayed in menus. It is possible e.g. to adapt
|
||
them to each language, or to embed HTML tooltips in them (as is used in
|
||
some HTML-based editor GUIs).
|
||
|
||
Usually, printnames are generated automatically from the symbol and/or
|
||
concrete syntax information.
|
||
|
||
|
||
### Parameter type definitions, param
|
||
|
||
[]{#paramjudgements}
|
||
|
||
A parameter type definition,
|
||
|
||
`param` *P* `=` *C*~1~ *G*~1~ `|` \... `|` *C*~n~ *G*~n~
|
||
|
||
defines a parameter type *P* with the **parameter constructors**
|
||
*C*~1~\...*C*~n~, with their respective contexts *G*~1~\...*G*~n~.
|
||
|
||
[]{#paramtypes}
|
||
|
||
Contexts have the same syntax as in `cat` judgements, explained
|
||
[here](#catjudgements). Since dependent types are not available in
|
||
parameter type definitions, the use of variables is never necessary. The
|
||
types in the context must themselves be **parameter types**, which are
|
||
defined as follows:
|
||
|
||
- Given the judgement `param` *P* \..., *P* is a parameter type.
|
||
- A record type of parameter types is a parameter type.
|
||
- `Ints` *n* (an initial segment of integers) is a parameter type.
|
||
|
||
The names defined by a parameter type definition include both the type
|
||
name *P* and the constructor names *C*~i~. Therefore all these names
|
||
must be distinct in a module.
|
||
|
||
A parameter type may not be recursive, i.e. *P* itself may not occur in
|
||
the contexts of its constructors. This restriction extends to mutual
|
||
recursion: we say that *P* **depends** on the types that occur in the
|
||
contexts of its constructors and on all types that those types depend
|
||
on, and state that *P* may not depend on itself.
|
||
|
||
In an `interface module`, it is possible to declare a parameter type
|
||
without defining it,
|
||
|
||
`param` *P* `;`
|
||
|
||
|
||
### Parameter values
|
||
|
||
[]{#paramvalues}
|
||
|
||
All parameter types are finite, and the GF compiler will internally
|
||
compute them to **lists of parameter values**. These lists are formed by
|
||
traversing the `param` definitions, usually respecting the order of
|
||
constructors in the source code. For records, bibliographical sorting is
|
||
applied. However, both the order of traversal of `param` definitions and
|
||
the order of fields in a record are specified in a compiler-internal
|
||
way, which means that the programmer should not rely on any particular
|
||
order.
|
||
|
||
The order of the list of parameter values can affect the program in two
|
||
cases:
|
||
|
||
- in the default `lindef` definition ([here](#lindefjudgements)), the
|
||
first value is chosen
|
||
- in course-of-value tables ([here](#tables)), the compiler-internal
|
||
order is followed
|
||
|
||
The first usage implies that, if `lindef` definitions are essential for
|
||
the application, they should be given manually. The second usage implies
|
||
that course-of-value tables should be avoided in hand-written GF code.
|
||
|
||
In run-time grammar generation, all parameter values are translated to
|
||
integers denotions positions in these parameter lists.
|
||
|
||
|
||
### Operation definitions, oper
|
||
|
||
An operation definition,
|
||
|
||
`oper` *h* `:` *T* `=` *t*
|
||
|
||
defines an **operation** *h* of type *T*, with the computation rule
|
||
|
||
*h* ==\> *t*
|
||
|
||
The type *T* can be any concrete syntax type, including function types
|
||
of any order. The term *t* must have the type *T*, as defined
|
||
[here](#expressions).
|
||
|
||
As syntactic sugar, the type can be omitted,
|
||
|
||
`oper` *h* `=` *t*
|
||
|
||
which works in two cases
|
||
|
||
- the type can be inferred from *t* (compiler-dependent)
|
||
- the definition occurs in an `instance` and the type is given in the
|
||
`interface`
|
||
|
||
It is also possible to give the type and the definition separately:
|
||
|
||
`oper` *h* `:` *T* ; `oper` *h* `=` *t* === `oper` *h* `:` *T* `=` *t*
|
||
|
||
The order of the type part and the definition part is free, and there
|
||
can be other judgements in between. However, they must occur in the same
|
||
`resource` module for it to be complete (as defined
|
||
[here](#completeness)). In an `interface` module, it is enough to give
|
||
the type.
|
||
|
||
When only the definition is given, it is possible to use a shorthand
|
||
similar to `lin` judgements:
|
||
|
||
`oper` *h* *p* `=` *t* === `oper` *h* `=` `\`*p* `->` *t*
|
||
|
||
The pattern *p* is either a variable or a wildcard (`_`).
|
||
|
||
Operation definitions may not be recursive, not even mutually recursive.
|
||
This condition ensures that functions can in the end be eliminated from
|
||
concrete syntax code (as explained [here](#functionelimination)).
|
||
|
||
|
||
### Operation overloading
|
||
|
||
[]{#overloading}
|
||
|
||
One and the same operation name *h* can be used for different
|
||
operations, which have to have different types. For each call of *h*,
|
||
the type checker selects one of these operations depending on what type
|
||
is expected in the context of the call. The syntax of overloaded
|
||
operation definitions is
|
||
|
||
`oper` *h* `= overload {`*h* : *T*~1~ = *t*~1~ ; \... ; *h* : *T*~n~ =
|
||
*t*~n~`}`
|
||
|
||
Notice that *h* must be the same in all cases. This format can be used
|
||
to give the complete implementation; to give just the types, e.g. in an
|
||
interface, one can use the form
|
||
|
||
`oper` *h* `: overload {`*h* : *T*~1~ ; \... ; *h* : *T*~n~`}`
|
||
|
||
The implementation of this operation typing is given by a judgement of
|
||
the first form. The order of branches need not be the same.
|
||
|
||
|
||
### Flag definitions, flags
|
||
|
||
A flag definition,
|
||
|
||
`flags` *o* `=` *v*
|
||
|
||
sets the value of the flag *o*, to be used when compiling or using the
|
||
module.
|
||
|
||
The flag *o* is an identifier, and the value *v* is either an identifier
|
||
or a quoted string.
|
||
|
||
Flags are a kind of metadata, which do not strictly belong to the GF
|
||
language. For instance, compilers do not necessarily check the
|
||
consistency of flags, or the meaningfulness of their values. The
|
||
inheritance of flags is not well-defined; the only certain rule is that
|
||
flags set in the module body override the settings from inherited
|
||
modules.
|
||
|
||
Here are some flags commonly included in grammars.
|
||
|
||
flag value description module
|
||
------------ -------------------- ---------------------------------- ----------
|
||
`coding` character encoding encoding used in string literals concrete
|
||
`startcat` category default target of parsing abstract
|
||
|
||
The possible values of these flags are specified [here](#flagvalues).
|
||
Note that the `lexer` and `unlexer` flags are deprecated. If you need
|
||
their functionality, you should use supply them to GF shell commands
|
||
like so:
|
||
|
||
put_string -lextext "страви, напої" | parse
|
||
|
||
A summary of their possible values can be found at the [GF shell
|
||
reference](http://www.grammaticalframework.org/doc/gf-shell-reference.html).
|
||
|
||
|
||
Types and expressions
|
||
---------------------
|
||
|
||
|
||
### Overview of expression forms
|
||
|
||
[]{#expressions}
|
||
|
||
Like many dependently typed languages, GF makes no syntactic distinction
|
||
between expressions and types. An illegal use of a type as an expression
|
||
or vice versa comes out as a type error. Whether a variable, for
|
||
instance, stands for a type or an expression value, can only be resolved
|
||
from its context of use.
|
||
|
||
One practical consequence of the common syntax is that global and local
|
||
definitions (`oper` judgements and `let` expressions, respectively) work
|
||
in the same way for types and expressions. Thus it is possible to
|
||
abbreviate a type occurring in a type expression:
|
||
|
||
let A = {s : Str ; b : Bool} in A -> A -> A
|
||
|
||
Type and other expressions have a system of **precedences**. The
|
||
following table summarizes all expression forms, from the highest to the
|
||
lowest precedence. Some expressions are moreover left- or
|
||
right-associative.
|
||
|
||
prec expression example explanation
|
||
--------- ------------------------------------ -----------------------------------
|
||
7 `c` constant or variable
|
||
7 `Type` the type of types
|
||
7 `PType` the type of parameter types
|
||
7 `Str` the type of strings/token lists
|
||
7 `"foo"` string literal
|
||
7 `123` integer literal
|
||
7 `0.123` floating point literal
|
||
7 `?` metavariable
|
||
7 `[]` empty token list
|
||
7 `[C a b]` list category
|
||
7 `["foo bar"]` token list
|
||
7 `{"s : Str ; n : Num}` record type
|
||
7 `{"s = "foo" ; n = Sg}` record
|
||
7 `<Sg,Fem,Gen>` tuple
|
||
7 `<n : Num>` type-annotated expression
|
||
6 left `t.r` projection or qualification
|
||
5 left `f a` function application
|
||
5 `table {Sg => [] ; _ => "xs"}` table
|
||
5 `table P [a ; b ; c]` course-of-values table
|
||
5 `case n of {Sg => [] ; _ => "xs"}` case expression
|
||
5 `variants {"color" ; "colour"}` free variation
|
||
5 `pre {vowel => "an" ; _ => "a"}` prefix-dependent choice
|
||
4 left `t ! v` table selection
|
||
4 left `A * B` tuple type
|
||
4 left `R ** {b : Bool}` record (type) extension
|
||
3 left `t + s` token gluing
|
||
2 left `t ++ s` token list concatenation
|
||
1 right `\x,y -> t` function abstraction (\"lambda\")
|
||
1 right `\\x,y => t` table abstraction
|
||
1 right `(x : A) -> B` dependent function type
|
||
1 right `A -> B` function type
|
||
1 right `P => T` table type
|
||
1 right `let x = v in t` local definition
|
||
1 `t where {x = v}` local definition
|
||
1 `in M.C "foo"` rule by example
|
||
|
||
Any expression in parentheses (`(`*exp*`)`) is in the highest precedence
|
||
class.
|
||
|
||
|
||
### The functional fragment: expressions in abstract syntax
|
||
|
||
[]{#functiontype}
|
||
|
||
The expression syntax is the same in abstract and concrete syntax,
|
||
although only a part of the syntax is actually usable in well-typed
|
||
expressions in abstract syntax. An abstract syntax is essentially used
|
||
for defining a set of types and a set of functions between those types.
|
||
Therefore it needs essentially the **functional fragment** of the
|
||
syntax. This fragment comprises two kinds of types:
|
||
|
||
- **basic types**, of form *C a1\...an* where
|
||
- `cat` *C* (*x*~1~ : *A*~1~)\...(*x*~n~ : *A*~n~), including the
|
||
predefined categories `Int`, `Float`, and `String` explained
|
||
[here](#predefabs)
|
||
- *a*~1~ : *A*~1~,\...,*a*~n~ : *A*~n~{*x*~1~ =
|
||
*a*~1~,\...,*x*~n-1~=*a*~n-1~}
|
||
|
||
<!-- -->
|
||
|
||
- **function types**, of form (*x* : *A*) `->` *B*, where
|
||
- *A* is a type
|
||
- *B* is a type possibly depending on *x* : *A*
|
||
|
||
When defining basic types, we used the notation *t*{*x*~1~ =
|
||
*t*~1~,\...,*x*~n~=*t*~n~} for the **substitution** of values to
|
||
variables. This is a metalevel notation, which denotes a term that is
|
||
formed by replacing the free occurrences of each variable *x*~i~ by
|
||
*t*~i~.
|
||
|
||
These types have six kinds of expressions:
|
||
|
||
- **constants**, *f* : *A* where
|
||
- `fun` *f* : *A*
|
||
|
||
<!-- -->
|
||
|
||
- **literals** for integers, floats, and strings (defined in
|
||
[here](#predefabs))
|
||
|
||
<!-- -->
|
||
|
||
- **variables**, *x* : *A* where
|
||
- *x* has been introduced by a binding
|
||
|
||
<!-- -->
|
||
|
||
- **applications**, *f a* : *B*{*x*=*a*}, where
|
||
- *f* : (*x* : *A*) `->` *B*
|
||
- *a* : *A*
|
||
|
||
<!-- -->
|
||
|
||
- **abstractions**, `\`*x* `->` *b* : (*x* : *A*) `->` *B*, where
|
||
- *b* : *B* possibly depending on *x* : *A*
|
||
|
||
<!-- -->
|
||
|
||
- **metavariables**, `?`, as introduced in intermediate phases of
|
||
incremental type checking; metavariables are not permitted in GF
|
||
source code
|
||
|
||
[]{#variablebinding}
|
||
|
||
The notion of **binding** is defined for occurrences of variables in
|
||
subexpressions as follows:
|
||
|
||
- in (*x* : *A*) `->` *B*, *x* is bound in *B*
|
||
- in `\`*x* `->` *b*, *x* is bound in *b*
|
||
- in `def` *f* *p*~1~ \... *p*~n~ = *t*, any pattern variable
|
||
introduced in any *pi* is bound in *t* (as defined
|
||
[here](#patternmatching))
|
||
|
||
As syntactic sugar, function types have sharing of types and suppression
|
||
of variables, in the same way as contexts (defined [here](#contexts)):
|
||
|
||
- variables can share a type,
|
||
`(` *x,y* `:` *A* `)` `->` *B* === `(` *x* `:` *A* `) -> (` *y* `:`
|
||
*A* `) ->` *B*
|
||
- a **wildcard** can be used for a variable not occurring later in the
|
||
type,
|
||
`(` `_` `:` *A* `) ->` *B* === `(` *x* `:` *T* `) ->` *B*
|
||
- if the variable does not occur later, it can be omitted altogether,
|
||
and parentheses are not used,
|
||
*A* `->` *B* === `(` *\_* `:` *A* `) ->` *B*
|
||
|
||
There is analogous syntactic sugar for constant functions,
|
||
|
||
`\`*\_* `->` *t* === `\`*x* `->` *t*
|
||
|
||
where *x* does not occur in *t*, and for multiple lambda abstractions:
|
||
|
||
`\`*p,q* `->` *t* === `\`*p* `->` `\`*q* `->` *t*
|
||
|
||
where *p* and *q* are variables or wild cards (`_`).
|
||
|
||
|
||
### Conversions
|
||
|
||
Among expressions, there is a relation of **definitional equality**
|
||
defined by four **conversion rules**:
|
||
|
||
- **alpha conversion**: `\`*x* `->` *b* = `\`*y* `->` *b*{*x*=*y*}
|
||
|
||
<!-- -->
|
||
|
||
- **beta conversion**: (`\`*x* `->` *b*) *a* = *b*{*x*=*a*}
|
||
|
||
<!-- -->
|
||
|
||
- **delta conversion**: *f* *a*~1~ \... *a*~n~ = *tg*, if
|
||
- there is a definition `def` *f* *p*~1~ \... *p*~n~ = *t*
|
||
- this definition is the first for *f* that matches the sequence
|
||
*a*~1~ \.... *a*~n~, with the substitution *g*
|
||
|
||
<!-- -->
|
||
|
||
- **eta conversion**: *c* = `\`*x* `->` *c x*, if *c* : (*x* : *A*)
|
||
`->` *B*
|
||
|
||
Pattern matching substitution used in delta conversion is defined
|
||
[here](#patternmatching).
|
||
|
||
An expression is in **beta-eta-normal form** if
|
||
|
||
- it has no subexpressions to which beta conversion applies (beta
|
||
normality)
|
||
- each constant or variable whose type is a function type must be
|
||
**eta-expanded**, i.e. made into an abstract equal to it by eta
|
||
conversion (eta normality)
|
||
|
||
Notice that the iteration of eta expansion would lead to an expression
|
||
not in beta-normal form.
|
||
|
||
|
||
### Syntax trees
|
||
|
||
[]{#syntaxtrees}
|
||
|
||
The **syntax trees** defined by an abstract syntax are well-typed
|
||
expressions of basic types in beta-eta normal form. Linearization
|
||
defined in concrete syntax applies to all and only these expressions.
|
||
|
||
There is also a direct definition of syntax trees, which does not refer
|
||
to beta and eta conversions: keeping in mind that a type always has the
|
||
form
|
||
|
||
(*x*~1~ : *A*~1~) `->` \... `->` (*x*~n~ : *A*~n~) `->` *B*
|
||
|
||
where *Ai* are types and *B* is a basic type, a syntax tree is an
|
||
expression
|
||
|
||
*b* *t*~1~ \... *t*~n~ : *B\'*
|
||
|
||
where
|
||
|
||
- *B\'* is the basic type *B*{*x*~1~ = *t*~1~,\...,*x*~n~ = *t*~n~}
|
||
- `fun` *b* : (*x*~1~ : *A*~1~) `->` \... `->` (*x*~n~ : *A*~n~) `->`
|
||
*B*
|
||
- each *t*~i~ has the form `\`*z*~1~,\...,*z*~m~ `->` *c* where *A*~i~
|
||
is
|
||
(*y*~1~ : *B*~1~) `->` \... `->` (*y*~m~ : *B*~m~) `->` *B*
|
||
|
||
|
||
### Predefined types in abstract syntax
|
||
|
||
[]{#predefabs}
|
||
|
||
GF provides three predefined categories for abstract syntax, with
|
||
predefined expressions:
|
||
|
||
category expressions
|
||
---------- ---------------------------------------
|
||
`Int` integer literals, e.g. `123`
|
||
`Float` floating point literals, e.g. `12.34`
|
||
`String` string literals, e.g. `"foo"`
|
||
|
||
These categories take no arguments, and they can be used as basic types
|
||
in the same way as if they were introduced in `cat` judgements. However,
|
||
it is not legal to define `fun` functions that have any of these types
|
||
as value type: their only well-typed expressions are literals as defined
|
||
in the above table.
|
||
|
||
|
||
### Overview of expressions in concrete syntax
|
||
|
||
[]{#cnctypes}
|
||
|
||
Concrete syntax is about defining mappings from abstract syntax trees to
|
||
**concrete syntax objects**. These objects comprise
|
||
|
||
- records
|
||
- tables
|
||
- strings
|
||
- parameter values
|
||
|
||
Thus functions are not concrete syntax objects; however, the mappings
|
||
themselves are expressed as functions, and the source code of a concrete
|
||
syntax can use functions under the condition that they can be eliminated
|
||
from the final compiled grammar (which they can; this is one of the
|
||
fundamental properties of compilation, as explained in more detail in
|
||
the *JFP* article).
|
||
|
||
Concrete syntax thus has the same function types and expression forms as
|
||
abstract syntax, specified [here](#functiontype). The basic types
|
||
defined by categories (`cat` judgements) are available via grammar reuse
|
||
explained [here](#reuse); this also comprises the predefined categories
|
||
`Float` and `String`.
|
||
|
||
|
||
### Values, canonical forms, and run-time variables
|
||
|
||
In abstract syntax, the conversion rules fiven [here](#conversions)
|
||
define a computational relation among expressions, but there is no
|
||
separate notion of a **value** of computation: the value (the end point)
|
||
of a computation chain is simply an expression to which no more
|
||
conversions apply. In general, we are interested in expressions that
|
||
satisfy the conditions of being syntax trees (as defined
|
||
[here](#syntaxtrees)), but there can be many computationally equivalent
|
||
syntax trees which nonetheless are distinct syntax trees and hence have
|
||
different linearizations. The main use of computation in abstract syntax
|
||
is to compare types in dependent type checking.
|
||
|
||
In concrete syntax, the notion of values is central. At run time, we
|
||
want to compute the values of linearizations; at compile time, we want
|
||
to perform **partial evaluation**, which computes expressions as far as
|
||
possible. To specify what happens in computation we therefore have to
|
||
distinguish between **canonical forms** and other forms of expressions.
|
||
The canonical forms are defined separately for each form of type,
|
||
whereas the other forms may usually produce expressions of any type.
|
||
|
||
[]{#linexpansion} []{#runtimevariables}
|
||
|
||
What is done at compile time is the elimination of any noncanonical
|
||
forms, except for those depending on **run-time variables**. Run-time
|
||
variables are the same as the **argument variables** of linearization
|
||
rules, i.e. the variables *x*~1~,\...,*x*~n~ in
|
||
|
||
`lin` *f* `= \` *x*~1~,\...,*x*~n~ `->` *t*
|
||
|
||
where
|
||
|
||
`fun` *f* `:` (*x*~1~ : *A*~1~) `->` \... `->` (*x*~n~ : *A*~n~) `->`
|
||
*B*
|
||
|
||
Notice that this definition refers to the **eta-expanded** linearization
|
||
term, which has one abstracted variable for each argument type of *f*.
|
||
These variables are not necessarily explicit in GF source code, but
|
||
introduced by the compiler.
|
||
|
||
Since certain expression forms should be eliminated in compilation but
|
||
cannot be eliminated if run-time variables appear in them, errors can
|
||
appear late in compilation. This is an issue with the following
|
||
expression forms:
|
||
|
||
- gluing (`s + t`), defined [here](#gluing)
|
||
- pattern matching on strings, defined [here](#patternmatching)
|
||
- predefined string operations, defined [here](#predefcnc) (those
|
||
taking `Str` arguments)
|
||
|
||
|
||
### Token lists, tokens, and strings
|
||
|
||
[]{#strtype}
|
||
|
||
The most prominent basic type is `Str`, the type of **token lists**.
|
||
This type is often sloppily referred to as the type of **strings**; but
|
||
it should be kept in mind that the objects of `Str` are *lists* of
|
||
strings rather than single strings.
|
||
|
||
Expressions of type `Str` have the following canonical forms:
|
||
|
||
- **tokens**, i.e. **string literals**, in double quotes, e.g. `"foo"`
|
||
- **the empty token list**, `[]`
|
||
- **concatenation**, *s* `++` *t*, where *s,t* : `Str`
|
||
- **prefix-dependent choice**,
|
||
`pre {p1 => s1 ; ... ; pn => sn ; _ => s }`, where
|
||
- *s*, *s*~1~,\...,*s*~n~, *p*~1~,\...,*p*~n~ : `Str`
|
||
|
||
For convenience, the notation is overloaded so that tokens are
|
||
identified with singleton token lists, and there is no separate type of
|
||
tokens (this is a change from the *JFP* article). The notion of a token
|
||
is still important for compilation: all tokens introduced by the grammar
|
||
must be known at compile time. This, in turn, is required by the parsing
|
||
algorithms used for parsing with GF grammars.
|
||
|
||
In addition to string literals, tokens can be formed by a specific
|
||
non-canonical operator:
|
||
|
||
- **gluing**, *s* `+` *t*, where *s,t* : `Str`
|
||
|
||
[]{#gluing}
|
||
|
||
Being noncanonical, gluing is equipped with a computation rule: string
|
||
literals are glued by forming a new string literal, and empty token
|
||
lists can be ignored:
|
||
|
||
- `"foo" + "bar"` ==\> `"foobar"`
|
||
- *t* `+ []` ==\> *t*
|
||
- `[] +` *t* ==\> *t*
|
||
|
||
Since tokens must be known at compile time, the operands of gluing may
|
||
not depend on run-time variables, as defined [here](#runtimevariables).
|
||
|
||
As syntactic sugar, token lists can be given as bracketed string
|
||
literals, where spaces separate tokens:
|
||
|
||
- **token lists**, `["one two three"]` === `"one" ++ "two" ++ "three"`
|
||
|
||
Notice that there are no empty tokens, but the expression `[]` can be
|
||
used in a context requiring a token, in particular in gluing expression
|
||
below. Since `[]` denotes an empty token list, the following computation
|
||
laws are valid:
|
||
|
||
- *t* `++ []` ==\> *t*
|
||
- `[] ++` *t* ==\> *t*
|
||
|
||
Moreover, concatenation and gluing are associative:
|
||
|
||
- s `+` (t `+` u) ==\> s `+` t `+` u
|
||
- s `++` (t `++` u) ==\> s `++` t `++` u
|
||
|
||
For the programmer, associativity and the empty token laws mean that the
|
||
compiler can use them to simplify string expressions. It also means that
|
||
these laws are respected in pattern matching on strings.
|
||
|
||
A prime example of prefix-dependent choice operation is the following
|
||
approximative expression for the English indefinite article:
|
||
|
||
pre {
|
||
("a" | "e" | "i" | "o") => "an" ;
|
||
_ => "a"
|
||
} ;
|
||
|
||
This expression can be computed in the context of a subsequent token:
|
||
|
||
- `pre {p1 => s1 ; ... ; pn => sn ; _ => s } ++ t` ==\>
|
||
- *s*~i~ for the first *i* such that the prefix *p*~i~ matches
|
||
*t*, if it exists
|
||
- *s* otherwise
|
||
|
||
The **matching prefix** is defined by comparing the string with the
|
||
prefix of the token. If the prefix is a variant list of strings, then it
|
||
matches the token if any of the strings in the list matches it.
|
||
|
||
The computation rule can sometimes be applied at compile time, but it
|
||
general, prefix-dependent choices need to be passed to the run-time
|
||
grammar, because they are not given a subsequent token to compare with,
|
||
or because the subsequent token depends on a run-time variable.
|
||
|
||
The prefix-dependent choice expression itself may not depend on run-time
|
||
variables.
|
||
|
||
*There is an older syntax for prefix-dependent choice, namely:
|
||
`pre { s ; s1 / p1 ; ... ; sn / pn}`. This syntax will not accept
|
||
strings as patterns.*
|
||
|
||
*In GF prior to 3.0, a specific type* `Strs` *is used for defining
|
||
prefixes,* *instead of just* `variants` *of* `Str`.
|
||
|
||
|
||
### Records and record types
|
||
|
||
A **record** is a collection of objects of possibly different types,
|
||
accessible by **projections** from the record with **labels** pointing
|
||
to these objects. A record is also itself an object, whose type is a
|
||
**record type**. Record types have the form
|
||
|
||
`{` *r*~1~ : *A*~1~ `;` \... `;` *r*~n~ : *A*~n~ `}`
|
||
|
||
where *n* \>= 0, each *A*~i~ is a type, and the labels *r*~i~ are
|
||
distinct. A record of this type has the form
|
||
|
||
`{` *r*~1~ = *a*~1~ `;` \... `;` *r*~n~ = *a*~n~ `}`
|
||
|
||
where each \#aii : \"Aii. A limiting case is the **empty record type**
|
||
`{}`, which has the object `{}`, the **empty record**.
|
||
|
||
The **fields** of a record type are its parts of the form *r* : *A*,
|
||
also called **typings**. The **fields** of a record are of the form *r*
|
||
= *a*, also called **value assignments**. Value assignments may
|
||
optionally indicate the type, as in *r* : *A* = *a*.
|
||
|
||
The order of fields in record types and records is insignificant: two
|
||
record types (or records) are equal if they have the same fields, in any
|
||
order, and a record is an object of a record type, if it has
|
||
type-correct value assignments for all fields of the record type. The
|
||
latter definition implies the even stronger principle of **record
|
||
subtyping**: a record can have any type that has some subset of its
|
||
fields. This principle is explained further [here](#subtyping).
|
||
|
||
All fields in a record must have distinct labels. Thus it is not
|
||
possible e.g. to \"redefine\" a field \"later\" in a record.
|
||
|
||
Lexically, labels are identifiers (defined [here](#identifiers)). This
|
||
is with the exception of the labels selecting bound variables in the
|
||
linearization of higher-order abstract syntax, which have the form
|
||
`$`*i* for an integer *i*, as specified [here](#HOAS). In source code,
|
||
these labels should not appear in records fields, but only in
|
||
selections.
|
||
|
||
Labels occur only in syntactic positions where they cannot be confused
|
||
with constants or variables. Therefore it is safe to write, as in
|
||
`Prelude`,
|
||
|
||
ss : Str -> {s : Str} = \s -> {s = s} ;
|
||
|
||
A **projection** is an expression of the form
|
||
|
||
*t*.*r*
|
||
|
||
where *t* must be a record and *r* must be a label defined in it. The
|
||
type of the projection is the type of that field. The computation rule
|
||
for projection returns the value assigned to that field:
|
||
|
||
`{` \... `;` *r* = *a* `;` \... `}.`*r* ==\> *a*
|
||
|
||
Notice that the dot notation *t*.*r* is also used for qualified names as
|
||
specified [here](#qualifiednames). This ambiguity follows tradition and
|
||
convenience. It is resolved by the following rules (before type
|
||
checking):
|
||
|
||
1. if *t* is a bound variable or a constant in scope, *t*.*r* is
|
||
type-checked as a projection
|
||
2. otherwise, *t*.*r* is type-checked as a qualified name
|
||
|
||
As syntactic sugar, types and values can be shared:
|
||
|
||
- `{` \... `;` *r,s* : *A* `;` \... `}` === `{` \... `;` *r* : *A* `;`
|
||
*s* : *A* `;` \... `}`
|
||
- `{` \... `;` *r,s* = *a* `;` \... `}` === `{` \... `;` *r* = *a* `;`
|
||
*s* = *a* `;` \... `}`
|
||
|
||
Another syntactic sugar are **tuple types** and **tuples**, which are
|
||
translated by endowing their unlabelled fields by the labels `p1`,
|
||
`p2`,\... in the order of appearance of the fields:
|
||
|
||
- *A*~1~ `*` \... `*` *A*~n~ === `{` `p1` : *A*~1~ `;` \... `;` `pn` :
|
||
*A*~n~ `}`
|
||
- `<`*a*~1~ `,` \... `,` *a*~n~ `>` === `{` `p1` = *a*~1~`;` \... `;`
|
||
`pn` = *a*~n~ `}`
|
||
|
||
A **record extension** is formed by adding fields to a record or a
|
||
record type. The general syntax involves two expressions,
|
||
|
||
*R* `**` *S*
|
||
|
||
The result is a record type or a record with a union of the fields of
|
||
*R* and *S*. It is therefore well-formed if
|
||
|
||
- both *R* and *S* are either records or record types
|
||
- the labels in *R* and *S* are disjoint, if *R* and *S* are record
|
||
types
|
||
|
||
(Since GF version 3.6) If *R* and *S* are record objects, then the
|
||
labels in them need not be disjoint. Labels defined in *S* are then
|
||
given priority, so that record extensions in fact works as **record
|
||
update**. A common pattern of using this feature is
|
||
|
||
lin F x ... = x ** {r = ... x.r ...}
|
||
|
||
where `x` is a record with many fields, just one of which is updated.
|
||
Following the normal binding conditions, `x.r` on the right hand side
|
||
still refers to the old value of the `r` field.
|
||
|
||
|
||
### Subtyping
|
||
|
||
The possibility of having superfluous fields in a record forms the basis
|
||
of the **subtyping** relation. That *A* is a subtype of *B* means that
|
||
*a : A* implies *a : B*. This is clearly satisfied for records with
|
||
superfluous fields:
|
||
|
||
- if *R* is a record type without the label *r*, then *R* `** {` *r* :
|
||
*A* `}` is a subtype of *R*
|
||
|
||
The GF grammar compiler extends subtyping to function types by
|
||
**covariance** and **contravariance**:
|
||
|
||
- covariance: if *A* is a subtype of *B*, then *C* `->` *A* is a
|
||
subtype of *C* `->` *B*
|
||
- contravariance: if *A* is a subtype of *B*, then *B* `->` *C* is a
|
||
subtype of *A* `->` *C*
|
||
|
||
The logic of these rules is natural: if a function is returns a value in
|
||
a subtype, then this value is *a fortiori* in the supertype. If a
|
||
function is defined for some type, then it is *a fortiori* defined for
|
||
any subtype.
|
||
|
||
In addition to the well-known principles of record subtyping and co- and
|
||
contravariance, GF implements subtyping for initial segments of
|
||
integers:
|
||
|
||
- if *m* \< *n*, then `Ints` *m* is a subtype of `Ints` *n*
|
||
- `Ints` *n* is a subtype of `Integer`
|
||
|
||
As the last rule, subtyping is transitive:
|
||
|
||
- if *A* is a subtype of *B* and *B* is a subtype of *C*, then *A* is
|
||
a subtype of *C*.
|
||
|
||
### List categories
|
||
|
||
[]{#lists}
|
||
|
||
Since categories of lists of elements of another category are a common idiom, the following syntactic sugar is available:
|
||
|
||
cat [C] {n}
|
||
|
||
abbreviates a set of three judgements:
|
||
|
||
cat ListC ;
|
||
fun BaseC : C -> ... -> C -> ListC ; --n C’s
|
||
fun ConsC : C -> ListC -> ListC
|
||
|
||
The functions `BaseC` and `ConsC` are automatically generated in the abstract syntax, but their linearizations, as well as the linearization type of `ListC`, must be defined manually. The type expression `[C]` is in all contexts interchangeable with `ListC`.
|
||
|
||
More information on lists in GF can be found [here](https://inariksit.github.io/gf/2021/02/22/lists.html).
|
||
|
||
### Tables and table types
|
||
|
||
[]{#tables}
|
||
|
||
One of the most characteristic constructs of GF is **tables**, also
|
||
called **finite functions**. That these functions are finite means that
|
||
it is possible to finitely enumerate all argument-value pairs; this, in
|
||
turn, is possible because the argument types are finite.
|
||
|
||
A **table type** has the form
|
||
|
||
*P* `=>` *T*
|
||
|
||
where *P* must be a parameter type in the sense defined
|
||
[here](#paramtypes), whereas *T* can be any type.
|
||
|
||
Canonical expressions of table types are **tables**, of the form
|
||
|
||
`table` `{` *V*~1~ `=>` *t*~1~ ; \... ; *V*~n~ `=>` *t*~n~ `}`
|
||
|
||
where *V*~1~,\...,*V*~n~ is the complete list of the parameter values of
|
||
the argument type *P* (defined [here](#paramvalues)), and each *t*~i~ is
|
||
an expression of the value type *T*.
|
||
|
||
In addition to explicit enumerations, tables can be given by **pattern
|
||
matching**,
|
||
|
||
`table` `{`*p*~1~ `=>` *t*~1~ ; \... ; *p*~m~ `=>` *t*~m~`}`
|
||
|
||
where *p*~1~,\....,*p*~m~ is a list of patterns that covers all values
|
||
of type *P*. Each pattern *p*~i~ may bind some variables, on which the
|
||
expression *t*~i~ may depend. A complete account of patterns and pattern
|
||
matching is given [here](#patternmatching).
|
||
|
||
A **course-of-values table** omits the patterns and just lists all
|
||
values. It uses the enumeration of all values of the argument type *P*
|
||
to pair the values with arguments:
|
||
|
||
`table` *P* `[`*t*~1~ ; \... ; *t*~n~`]`
|
||
|
||
This format is not recommended for GF source code, since the ordering of
|
||
parameter values is not specified and therefore a compiler-internal
|
||
decision.
|
||
|
||
The argument type can be indicated in ordinary tables as well, which is
|
||
sometimes helpful for type inference:
|
||
|
||
`table` *P* `{` \... `}`
|
||
|
||
The **selection** operator `!`, applied to a table *t* and to an
|
||
expression *v* of its argument type
|
||
|
||
*t* `!` *v*
|
||
|
||
returns the first pattern matching result from *t* with *v*, as defined
|
||
[here](#patternmatching). The order of patterns is thus significant as
|
||
long as the patterns contain variables or wildcards. When the compiler
|
||
reorders the patterns following the enumeration of all values of the
|
||
argument type, this order no longer matters, because no overlap remains
|
||
between patterns.
|
||
|
||
The GF compiler performs **table expansion**, i.e. an analogue of eta
|
||
expansion defined [here](#conversions), where a table is applied to all
|
||
values to its argument type:
|
||
|
||
*t* : *P* `=>` *T* ==\> `table` *P* `[`*t* `!` *V*~1~ ; \... ; *t* `!`
|
||
*V*~n~`]`
|
||
|
||
As syntactic sugar, one-branch tables can be written in a way similar to
|
||
lambda abstractions:
|
||
|
||
`\\`*p* `=>` *t* === `table {`*p* `=>` *t* `}`
|
||
|
||
where *p* is either a variable or a wildcard (`_`). Multiple bindings
|
||
can be abbreviated:
|
||
|
||
`\\`*p,q* `=>` *t* === `\\`*p* `=>` `\\`*q* `=>` *t*
|
||
|
||
**Case expressions** are syntactic sugar for selections:
|
||
|
||
`case` *e* `of {`\...`}` === `table {`\...`} !` *e*
|
||
|
||
|
||
### Pattern matching
|
||
|
||
[]{#patternmatching}
|
||
|
||
We will list all forms of patterns that can be used in table branches.
|
||
We define their **variable bindings** and **matching substitutions**.
|
||
|
||
We start with the patterns available for all parameter types, as well as
|
||
for the types `Integer` and `Str`.
|
||
|
||
- A constructor pattern *C* *p*~1~\...*p*~n~ binds the union of all
|
||
variables bound in the subpatterns *p*~1~,\...,*p*~n~. It matches
|
||
any value *C* *V*~1~\...*V*~n~ where each *p*~i~\# matches *V*~i~,
|
||
and the matching substitution is the union of these substitutions.
|
||
- A record pattern `{` *r*~1~ `=` *p*~1~ `;` \... `;` *r*~n~ `=`
|
||
*p*~n~ `}` binds the union of all variables bound in the subpatterns
|
||
*p*~1~,\...,*p*~n~. It matches any value `{` *r*~1~ `=` *V*~1~ `;`
|
||
\... `;` *r*~n~ `=` *V*~n~ `;` \...`}` where each *p*~i~\# matches
|
||
*V*~i~, and the matching substitution is the union of these
|
||
substitutions.
|
||
- A variable pattern *x* (identifier other than parameter constructor)
|
||
binds the variable *x*. It matches any value *V*, with the
|
||
substitution {*x* = *V*}.
|
||
- The wild card `_` binds no variables. It matches any value, with the
|
||
empty substitution.
|
||
- A disjunctive pattern *p* `|` *q* binds the intersection of the
|
||
variables bound by *p* and *q*. It matches anything that either *p*
|
||
or *q* matches, with the first substitution starting with *p*
|
||
matches, from which those variables that are not bound by both
|
||
patterns are removed.
|
||
- A negative pattern `-` *p* binds no variables. It matches anything
|
||
that *p* does *not* match, with the empty substitution.
|
||
- An alias pattern *x* `@` *p* binds *x* and all the variables bound
|
||
by *p*. It matches any value *V* that *p* matches, with the same
|
||
substition extended by {*x* = *V*}.
|
||
|
||
The following patterns are only available for the type `Str`:
|
||
|
||
- A string literal pattern, e.g. `"s"`, binds no variables. It matches
|
||
the same string, with the empty substitution.
|
||
- A concatenation pattern, *p* `+` *q*, binds the union of variables
|
||
bound by *p* and *q*. It matches any string that consists of a
|
||
prefix matching *p* and a suffix matching *q*, with the union of
|
||
substitutions corresponding to the first match (see below).
|
||
- A repetition pattern *p*`*` binds no variables. It matches any
|
||
string that can be decomposed into strings that match *p*, with the
|
||
empty substitution.
|
||
|
||
The following pattern is only available for the types `Integer` and
|
||
`Ints` *n*:
|
||
|
||
- An integer literal pattern, e.g. `214`, binds no variables. It
|
||
matches the same integer, with the empty substitution.
|
||
|
||
All patterns must be **linear**: the same pattern variable may occur
|
||
only once in them. This is what makes it straightforward to speak about
|
||
unions of binding sets and substitutions.
|
||
|
||
Pattern matching is performed in the order in which the branches appear
|
||
in the source code: the branch of the first matching pattern is
|
||
followed. In concrete syntax, the type checker reject sets of patterns
|
||
that are not exhaustive, and warns for completely overshadowed patterns.
|
||
It also checks the type correctness of patterns with respect to the
|
||
argument type. In abstract syntax, only type correctness is checked, no
|
||
exhaustiveness or overshadowing.
|
||
|
||
It follows from the definition of record pattern matching that it can
|
||
utilize partial records: the branch
|
||
|
||
{g = Fem} => t
|
||
|
||
in a table of type `{g : Gender ; n : Number} => T` means the same as
|
||
|
||
{g = Fem ; n = _} => t
|
||
|
||
Variables in regular expression patterns are always bound to the **first
|
||
match**, which is the first in the sequence of binding lists. For
|
||
example:
|
||
|
||
- `x + "e" + y` matches `"peter"` with `x = "p", y = "ter"`
|
||
- `x + "er"*` matches `"burgerer"` with `x = "burg"`
|
||
|
||
|
||
### Free variation
|
||
|
||
An expressions of the form
|
||
|
||
`variants` `{`*t*~1~ ; \... ; *t*~n~`}`
|
||
|
||
where all *t*~i~ are of the same type *T*, has itseld type *T*. This
|
||
expression presents *t*~i~,\...,*t*~n~ as being in **free variation**:
|
||
the choice between them is not determined by semantics or parameters. A
|
||
limiting case is
|
||
|
||
`variants {}`
|
||
|
||
which encodes a rule saying that there is no way to express a certain
|
||
thing, e.g. that a certain inflectional form does not exist.
|
||
|
||
A common wisdom in linguistics is that \"there is no free variation\",
|
||
which refers to the situation where *all* aspects are taken into
|
||
account. For instance, the English negation contraction could be
|
||
expressed as free variation,
|
||
|
||
variants {"don't" ; "do" ++ "not"}
|
||
|
||
if only semantics is taken into account, but if stylistic aspects are
|
||
included, then the proper formulation might be with a parameter
|
||
distinguishing between informal and formal style:
|
||
|
||
case style of {Informal => "don't" ; Formal => "do" ++ "not"}
|
||
|
||
Since there is not way to choose a particular element from a
|
||
\`\`variants\` list, free variants is normally not adequate in
|
||
libraries, nor in grammars meant for natural language generation. In
|
||
application grammars meant to parse user input, free variation is a way
|
||
to avoid cluttering the abstract syntax with semantically insignificant
|
||
distinctions and even to tolerate some grammatical errors.
|
||
|
||
Permitting `variants` in all types involves a major modification of the
|
||
semantics of GF expressions. All computation rules have to be lifted to
|
||
deal with lists of expressions and values. For instance,
|
||
|
||
*t* `!` `variants` `{`*t*~1~ ; \... ; *t*~n~`}` ==\> `variants` `{`*t*
|
||
`!` *t*~1~ ; \... ; *t* `!` *t*~n~`}`
|
||
|
||
This is done in such a way that variation does not distribute to records
|
||
(or other product-like structures). For instance, variants of records,
|
||
|
||
variants {{s = "Auto" ; g = Neutr} ; {s = "Wagen" ; g = Masc}}
|
||
|
||
is *not* the same as a record of variants,
|
||
|
||
{s = variants {"Auto" ; "Wagen"} ; g = variants {Neutr ; Masc}}
|
||
|
||
Variants of variants are flattened,
|
||
|
||
`variants` `{`\...; `variants` `{`*t*~1~ ;\...; *t*~n~`}` ;\...`}` ==\>
|
||
`variants` `{`\...; *t*~1~ ;\...; *t*~n~ ;\...`}`
|
||
|
||
and singleton variants are eliminated,
|
||
|
||
`variants` `{`*t*`}` ==\> *t*
|
||
|
||
|
||
### Local definitions
|
||
|
||
A **local definition**, i.e. a **let expression** has the form
|
||
|
||
`let` *x* : *T* = *t* `in` *e*
|
||
|
||
The type of *x* must be *T*, which also has to be the type of *t*.
|
||
Computation is performed by substituting *t* for *x* in *e*:
|
||
|
||
`let` *x* : *T* = *t* `in` *e* ==\> *e* {*x* = *t*}
|
||
|
||
As syntactic sugar, the type can be omitted if the type checker is able
|
||
to infer it:
|
||
|
||
`let` *x* = *t* `in` *e*
|
||
|
||
It is possible to compress several local definitions into one block:
|
||
|
||
`let` *x* : *T* = *t* `;` *y* : *U* = *u* `in` *e* === `let` *x* : *T* =
|
||
*t* `in` `let` *y* : *U* = *u* `in` *e*
|
||
|
||
Another notational variant is a definition block appearing after the
|
||
main expression:
|
||
|
||
*e* `where` `{`\...`}` === `let` `{`\...`}` `in` *e*
|
||
|
||
Curly brackets are obligatory in the `where` form, and can also be
|
||
optionally used in the `let` form.
|
||
|
||
Since a block of definitions is treated as syntactic sugar for a nested
|
||
`let` expression, a constant must be defined before it is used: the
|
||
scope is not mutual, as in a module body. Furthermore, unlike in `lin`
|
||
and `oper` definitions, it is *not* possible to bind variables on the
|
||
left of the equality sign.
|
||
|
||
|
||
### Function applications in concrete syntax
|
||
|
||
[]{#functionelimination}
|
||
|
||
Fully compiled concrete syntax may not include expressions of function
|
||
types except on the outermost level of `lin` rules, as defined
|
||
[here](#linexpansion). However, in the source code, and especially in
|
||
`oper` definitions, functions are the main vehicle of code reuse and
|
||
abstraction. Thus function types and functions follow the same rules as
|
||
in abstract syntax, as specified [here](#functiontype). In particular,
|
||
the application of a lambda abstract is computed by beta conversion.
|
||
|
||
To ensure the elimination of functions, GF uses a special computation
|
||
rule for pushing function applications inside tables, since otherwise
|
||
run-time variables could block their applications:
|
||
|
||
(`table` `{`*p*~1~ `=>` *f*~1~ ; \... ; *p*~n~ `=>` *f*~n~ `}` `!` *e*)
|
||
*a* ==\> `table` `{`*p*~1~ `=>` *f*~1~ *a* ; \... ; *p*~n~ `=>` *f*~n~
|
||
*a*`}` `!` *e*
|
||
|
||
Also parameter constructors with non-empty contexts, as defined
|
||
[here](#paramjudgements), result in expressions in application form.
|
||
These expressions are never a problem if their arguments are just
|
||
constructors, because they can then be translated to integers
|
||
corresponding to the position of the expression in the enumaration of
|
||
the values of its type. However, a constructor applied to a run-time
|
||
variable may need to be converted as follows:
|
||
|
||
*C*\...*x*\... ==\> `case` *x* of `{_ =>` *C*\...*x*`}`
|
||
|
||
The resulting expression, when processed by table expansion as explained
|
||
[here](#tables), results in *C* being applied to just values of the type
|
||
of *x*, and the application thereby disappears.
|
||
|
||
|
||
### Reusing top-level grammars as resources
|
||
|
||
[]{#reuse}
|
||
|
||
*This section is valid for GF 3.0, which abandons the \"[lock field](https://inariksit.github.io/gf/2018/05/25/subtyping-gf.html#lock-fields)\"*
|
||
*discipline of GF 2.8.*
|
||
|
||
As explained [here](#openabstract), abstract syntax modules can be
|
||
opened as interfaces and concrete syntaxes as their instances. This
|
||
means that judgements are, as it were, translated in the following way:
|
||
|
||
- `cat` *C* *G* ===\> `oper` *C* : `Type`
|
||
- `fun` *f* : *T* ===\> `oper` *f* : *T*
|
||
- `lincat` *C* = *T* ===\> `oper` *C* : `Type` = *C*
|
||
- `lin` *f* = *t* ===\> `oper` *f* = *t*
|
||
|
||
Notice that the value *T* of `lincat` definitions is not disclosed in
|
||
the translation. This means that the type *C* remains abstract: the only
|
||
ways of building an object of type *C* are the operations *f* obtained
|
||
from *fun* and *lin* rules.
|
||
|
||
The purpose of keeping linearization types abstract is to enforce
|
||
**grammar checking via type checking**. This means that any well-typed
|
||
operation application is also well-typed in the sense of the original
|
||
grammar. If the types were disclosed, then we could for instance easily
|
||
confuse all categories that have the linearization type `{s : Str}`. Yet
|
||
another reason is that revealing the types makes it impossible for the
|
||
library programmers to change their type definitions afterwards.
|
||
|
||
Library writers may occasionally want to have access to the values of
|
||
linearization types. The way to make it possible is to add an extra
|
||
construction operation to a module in which the linearization type is
|
||
available:
|
||
|
||
oper MkC : T -> C = \x -> x
|
||
|
||
In object-oriented terms, the type *C* itself is **protected**, whereas
|
||
*MkC* is a **public constructor** of *C*. Of course, it is possible to
|
||
make these constructors overloaded (concept explained
|
||
[here](#overloading)), to enable easy access to special cases.
|
||
|
||
|
||
### Predefined concrete syntax types
|
||
|
||
[]{#predefcnc}
|
||
|
||
The following concrete syntax types are predefined:
|
||
|
||
- `Str`, the type of tokens and token lists (defined [here](#strtype))
|
||
- `Integer`, the type of nonnegative integers
|
||
- `Ints` *n*, the type of integers from *0* to *n*
|
||
- `Type`, the type of (concrete syntax) types
|
||
- `PType`, the type of parameter types
|
||
|
||
The last two types are, in a way, extended by user-written grammars,
|
||
since new parameter types can be defined in the way shown
|
||
[here](#paramjudgements), and every paramater type is also a type. From
|
||
the point of view of the values of expressions, however, a `param`
|
||
declaration does not extend `PType`, since all parameter types get
|
||
compiled to initial segments of integers.
|
||
|
||
Notice the difference between the concrete syntax types `Str` and
|
||
`Integer` on the one hand, and the abstract syntax categories `String`
|
||
and `Int`, on the other. As *concrete syntax* types, the latter are
|
||
treated in the same way as any reused categories: their objects can be
|
||
formed by using syntax trees (string and integer literals).
|
||
|
||
*The type name* `Integer` *replaces in GF 3.0 the name* `Int`, *to avoid
|
||
confusion with the abstract syntax type and to be analogous* *with the*
|
||
`Str` *vs.* `String` *distinction.*
|
||
|
||
|
||
### Predefined concrete syntax operations
|
||
|
||
The following predefined operations are defined in the resource module
|
||
`prelude/Predef.gf`. Their implementations are defined as a part of the
|
||
GF grammar compiler.
|
||
|
||
-------------------------------------------------------------------------------------------------
|
||
operation type explanation
|
||
-------------- --------------------------------- ------------------------------------------------
|
||
`PBool` `PType` `PTrue | PFalse`
|
||
|
||
`Error` `Type` the empty type
|
||
|
||
`Int` `Type` the type of integers
|
||
|
||
`Ints` `Integer -> Type` the type of integers from 0 to n
|
||
|
||
`error` `Str -> Error` forms error message
|
||
|
||
`length` `Str -> Int` length of string
|
||
|
||
`drop` `Integer -> Str -> Str` drop prefix of length
|
||
|
||
`take` `Integer -> Str -> Str` take prefix of length
|
||
|
||
`tk` `Integer -> Str -> Str` drop suffix of length
|
||
|
||
`dp` `Integer -> Str -> Str` take suffix of length
|
||
|
||
`eqInt` `Integer -> Integer -> PBool` test if equal integers
|
||
|
||
`lessInt` `Integer -> Integer -> PBool` test order of integers
|
||
|
||
`plus` `Integer -> Integer -> Integer` add integers
|
||
|
||
`eqStr` `Str -> Str -> PBool` test if equal strings
|
||
|
||
`occur` `Str -> Str -> PBool` test if occurs as substring
|
||
|
||
`occurs` `Str -> Str -> PBool` test if any char occurs
|
||
|
||
`show` `(P : Type) -> P -> Str` convert param to string
|
||
|
||
`read` `(P : Type) -> Str -> P` convert string to param
|
||
|
||
`toStr` `(L : Type) -> L -> Str` find the \"first\" string
|
||
|
||
`nonExist` `Str` a special token marking\
|
||
non-existing morphological forms
|
||
|
||
`BIND` `Str` a special token marking\
|
||
that the surrounding tokens should not\
|
||
be separated by space
|
||
|
||
`SOFT_BIND` `Str` a special token marking\
|
||
that the surrounding tokens may not\
|
||
be separated by space
|
||
|
||
`SOFT_SPACE` `Str` a special token marking\
|
||
that the space between the surrounding tokens\
|
||
is optional
|
||
|
||
`CAPIT` `Str` a special token marking\
|
||
that the first character in the next token\
|
||
should be capitalized
|
||
|
||
`ALL_CAPIT` `Str` a special token marking\
|
||
that the next word should be\
|
||
in all capital letters
|
||
-------------------------------------------------------------------------------------------------
|
||
|
||
Compilation eliminates these operations, and they may therefore not take
|
||
arguments that depend on run-time variables.
|
||
|
||
The module `Predef` is included in the *opens* list of all modules, and
|
||
therefore does not need to be opened explicitly.
|
||
|
||
|
||
Flags and pragmas
|
||
-----------------
|
||
|
||
|
||
### Some flags and their values
|
||
|
||
[]{#flagvalues}
|
||
|
||
The flag `coding` in concrete syntax sets the **character encoding**
|
||
used in the grammar. Internally, GF uses unicode, and `.pgf` files are
|
||
always written in UTF8 encoding. The presence of the flag `coding=utf8`
|
||
prevents GF from encoding an already encoded file.
|
||
|
||
The flag `startcat` in abstract syntax sets the default start category
|
||
for parsing, random generation, and any other grammar operation that
|
||
depends on category. Its legal values are the categories defined or
|
||
inherited in the abstract syntax.
|
||
|
||
|
||
### Compiler pragmas
|
||
|
||
**Compiler pragmas** are a special form of comments prefixed with `--#`.
|
||
Currently GF interprets the following pragmas.
|
||
|
||
pragma explanation
|
||
-------------- ---------------------------------
|
||
`-path=`PATH path list for searching modules
|
||
|
||
For instance, the line
|
||
|
||
--# -path=.:present:prelude:/home/aarne/GF/tmp
|
||
|
||
in the top of `FILE.gf` causes the GF compiler, when invoked on
|
||
`FILE.gf`, to search through the current directory (`.`) and the
|
||
directories `present`, `prelude`, and `/home/aarne/GF/tmp`, in this
|
||
order. If a directory `DIR` is not found relative to the working
|
||
directory, `$(GF_LIB_PATH)/DIR` is searched. `$GF_LIB_PATH` can be a
|
||
colon-separated list of directories, in which case each directory in the
|
||
list contributes to the search path expansion.
|
||
|
||
|
||
Alternative grammar input formats
|
||
---------------------------------
|
||
|
||
While the GF language as specified in this document is the most
|
||
versatile and powerful way of writing GF grammars, there are several
|
||
other formats that a GF compiler may make available for users, either to
|
||
get started with small grammars or to semiautomatically convert grammars
|
||
from other formats to GF. Here are the ones supported by GF 2.8 and 3.0.
|
||
|
||
|
||
### Old GF without modules
|
||
|
||
[]{#oldgf}
|
||
|
||
Before GF compiler version 2.0, there was no module system, and all
|
||
kinds of judgement could be written in all files, without any headers.
|
||
This format is still available, and the compiler (version 2.8) detects
|
||
automatically if a file is in the current or the old format. However,
|
||
the old format is not recommended because of pure modularity and missing
|
||
separate compilation, and also because libraries are not available,
|
||
since the old and the new format cannot be mixed. With version 2.8,
|
||
grammars in the old format can be converted to modular grammar with the
|
||
command
|
||
|
||
> import -o FILE.gf
|
||
|
||
which rewrites the grammar divided into three files: an abstract, a
|
||
concrete, and a resource module.
|
||
|
||
|
||
### Context-free grammars
|
||
|
||
A quick way to write a GF grammar is to use the context-free format,
|
||
also known as BNF. Files of this form are recognized by the suffix
|
||
`.cf`. Rules in these files have the form
|
||
|
||
*Label* `.` *Cat* `::=` (*String* \| *Cat*)\* `;`
|
||
|
||
where *Label* and *Cat* are identifiers and *String* quoted strings.
|
||
|
||
There is a shortcut form generating labels automatically,
|
||
|
||
*Cat* `::=` (*String* \| *Cat*)\* `;`
|
||
|
||
In the shortcut form, vertical bars (`|`) can be used to give several
|
||
right-hand-sides at a time. An empty right-hand side means the singleton
|
||
of an empty sequence, and not an empty union.
|
||
|
||
Just like old-style GF files (previous section), contex-free grammar
|
||
files can be converted to modular GF by using the `-o` option to the
|
||
compiler in GF 2.8.
|
||
|
||
|
||
### Extended BNF grammars
|
||
|
||
Extended BNF (`FILE.ebnf`) goes one step further from the shortcut
|
||
notation of previous section. The rules have the form
|
||
|
||
*Cat* `::=` *RHS* `;`
|
||
|
||
where an *RHS* can be any regular expression built from quoted strings
|
||
and category symbols, in the following ways:
|
||
|
||
RHS item explanation
|
||
----------------- ------------------------
|
||
*Cat* nonterminal
|
||
*String* terminal
|
||
*RHS* *RHS* sequence
|
||
*RHS* `|` *RHS* alternatives
|
||
*RHS* `?` optional
|
||
*RHS* `*` repetition
|
||
*RHS* `+` non-empty repetition\|
|
||
|
||
Parentheses are used to override standard precedences, where `|` binds
|
||
weaker than sequencing, which binds weaker than the unary operations.
|
||
|
||
The compiler generates not only labels, but also new categories
|
||
corresponding to the regular expression combinations actually in use.
|
||
|
||
Just like `.cf` files (previous section), `.ebnf` files can be converted
|
||
to modular GF by using the `-o` option to the compiler in GF 2.8.
|
||
|
||
|
||
### Example-based grammars
|
||
|
||
**Example-based grammars** (`.gfe`) provide a way to use resource
|
||
grammar libraries without having to know the names of functions in them.
|
||
The compiler works as a preprocessor, saving the result in a (`.gf`)
|
||
file, which can be compiled as usual.
|
||
|
||
If a library is implemented as an abstract and concrete syntax, it can
|
||
be used for parsing. Calls of library functions can therefore be formed
|
||
by parsing strings in the library. GF has an expression format for this,
|
||
|
||
`in` *C* *String*
|
||
|
||
where *C* is the category in which to parse (it can be qualified by the
|
||
module name) and the string is the input to parser. Expressions of this
|
||
form are replaced by the syntax trees that result. These trees are
|
||
always type-correct. If several parses are found, all but the first one
|
||
are given in comments.
|
||
|
||
Here is an example, from `GF/examples/animal/`:
|
||
|
||
--# -resource=../../lib/present/LangEng.gfc
|
||
--# -path=.:present:prelude
|
||
|
||
incomplete concrete QuestionsI of Questions = open Lang in {
|
||
lincat
|
||
Phrase = Phr ;
|
||
Entity = N ;
|
||
Action = V2 ;
|
||
lin
|
||
Who love_V2 man_N = in Phr "who loves men" ;
|
||
Whom man_N love_V2 = in Phr "whom does the man love" ;
|
||
Answer woman_N love_V2 man_N = in Phr "the woman loves men" ;
|
||
}
|
||
|
||
The `resource` pragma shows the grammar that is used for parsing the
|
||
examples.
|
||
|
||
Notice that the variables `love_V2`, `man_N`, etc, are actually
|
||
constants in the library. In the resulting rules, such as
|
||
|
||
lin Whom = \man_N -> \love_V2 ->
|
||
PhrUtt NoPConj (UttQS (UseQCl TPres ASimul PPos
|
||
(QuestSlash whoPl_IP (SlashV2 (DetCN (DetSg (SgQuant
|
||
DefArt)NoOrd)(UseN man_N)) love_V2)))) NoVoc ;
|
||
|
||
those constants are nonetheless treated as variables, following the
|
||
normal binding conventions, as stated [here](#renaming).
|
||
|
||
|
||
The grammar of GF
|
||
-----------------
|
||
|
||
The following grammar is actually used in the parser of GF, although we
|
||
have omitted some obsolete rules still included in the parser for
|
||
backward compatibility reasons.
|
||
|
||
This document was automatically generated by the *BNF-Converter*. It was
|
||
generated together with the lexer, the parser, and the abstract syntax
|
||
module, which guarantees that the document matches with the
|
||
implementation of the language (provided no hand-hacking has taken
|
||
place).
|
||
|
||
|
||
The lexical structure of GF
|
||
---------------------------
|
||
|
||
|
||
### Identifiers
|
||
|
||
Identifiers *Ident* are unquoted strings beginning with a letter,
|
||
followed by any combination of letters, digits, and the characters `_ '`
|
||
reserved words excluded.
|
||
|
||
|
||
### Literals
|
||
|
||
Integer literals *Integer* are nonempty sequences of digits.
|
||
|
||
String literals *String* have the form `"`*x*`"`}, where *x* is any
|
||
sequence of any characters except `"` unless preceded by `\`.
|
||
|
||
Double-precision float literals *Double* have the structure indicated by
|
||
the regular expression `digit+ '.' digit+ ('e' ('-')? digit+)?` i.e.
|
||
two sequences of digits separated by a decimal point, optionally
|
||
followed by an unsigned or negative exponent.
|
||
|
||
|
||
### Reserved words and symbols
|
||
|
||
The set of reserved words is the set of terminals appearing in the
|
||
grammar. Those reserved words that consist of non-letter characters are
|
||
called symbols, and they are treated in a different way from those that
|
||
are similar to identifiers. The lexer follows rules familiar from
|
||
languages like Haskell, C, and Java, including longest match and spacing
|
||
conventions.
|
||
|
||
The reserved words used in GF are the following:
|
||
|
||
- `PType`
|
||
- `Str`
|
||
- `Strs`
|
||
- `Type`
|
||
- `abstract`
|
||
- `case`
|
||
- `cat`
|
||
- `concrete`
|
||
- `data`
|
||
- `def`
|
||
- `flags`
|
||
- `fun`
|
||
- `in`
|
||
- `incomplete`
|
||
- `instance`
|
||
- `interface`
|
||
- `let`
|
||
- `lin`
|
||
- `lincat`
|
||
- `lindef`
|
||
- `linref`
|
||
- `of`
|
||
- `open`
|
||
- `oper`
|
||
- `param`
|
||
- `pre`
|
||
- `printname`
|
||
- `resource`
|
||
- `strs`
|
||
- `table`
|
||
- `transfer`
|
||
- `variants`
|
||
- `where`
|
||
- `with`
|
||
|
||
The symbols used in GF are the following:
|
||
|
||
- `;`
|
||
- `=`
|
||
- `:`
|
||
- `->`
|
||
- `{`
|
||
- `}`
|
||
- `**`
|
||
- `,`
|
||
- `(`
|
||
- `)`
|
||
- `[`
|
||
- `]`
|
||
- `-`
|
||
- `.`
|
||
- `|`
|
||
- `?`
|
||
- `<`
|
||
- `>`
|
||
- `@`
|
||
- `!`
|
||
- `*`
|
||
- `+`
|
||
- `++`
|
||
- `\`
|
||
- `=>`
|
||
- `_`
|
||
- `$`
|
||
- `/`
|
||
|
||
### Comments
|
||
|
||
Single-line comments begin with `--`.
|
||
Multiple-line comments are enclosed with `{-` and `-}`.
|
||
|
||
|
||
The syntactic structure of GF
|
||
-----------------------------
|
||
|
||
Terminal appear as `code`.
|
||
The symbols **->** (production), **|** (union) and **eps** (empty rule) belong to the BNF notation.
|
||
All other symbols are non-terminals.
|
||
|
||
--------------------- -------- ------------------------------------------------------------------------------------
|
||
*Grammar* **->** *\[ModDef\]*
|
||
*\[ModDef\]* **->** **eps**
|
||
**|** *ModDef* *\[ModDef\]*
|
||
*ModDef* **->** *ModDef* `;`
|
||
**|** *ComplMod* *ModType* `=` *ModBody*
|
||
*ModType* **->** `abstract` *Ident*
|
||
**|** `resource` *Ident*
|
||
**|** `interface` *Ident*
|
||
**|** `concrete` *Ident* `of` *Ident*
|
||
**|** `instance` *Ident* `of` *Ident*
|
||
**|** `transfer` *Ident* `:` *Open* `->` *Open*
|
||
*ModBody* **->** *Extend* *Opens* `{` *\[TopDef\]* `}`
|
||
**|** *\[Included\]*
|
||
**|** *Included* `with` *\[Open\]*
|
||
**|** *Included* `with` *\[Open\]* `**` *Opens* `{` *\[TopDef\]* `}`
|
||
**|** *\[Included\]* `**` *Included* `with` *\[Open\]*
|
||
**|** *\[Included\]* `**` *Included* `with` *\[Open\]* `**` *Opens* `{` *\[TopDef\]* `}`
|
||
*\[TopDef\]* **->** **eps**
|
||
**|** *TopDef* *\[TopDef\]*
|
||
*Extend* **->** *\[Included\]* `**`
|
||
**|** **eps**
|
||
*\[Open\]* **->** **eps**
|
||
**|** *Open*
|
||
**|** *Open* `,` *\[Open\]*
|
||
*Opens* **->** **eps**
|
||
**|** `open` *\[Open\]* `in`
|
||
*Open* **->** *Ident*
|
||
**|** `(` *QualOpen* *Ident* `)`
|
||
**|** `(` *QualOpen* *Ident* `=` *Ident* `)`
|
||
*ComplMod* **->** **eps**
|
||
**|** `incomplete`
|
||
*QualOpen* **->** **eps**
|
||
*\[Included\]* **->** **eps**
|
||
**|** *Included*
|
||
**|** *Included* `,` *\[Included\]*
|
||
*Included* **->** *Ident*
|
||
**|** *Ident* `[` *\[Ident\]* `]`
|
||
**|** *Ident* `-` `[` *\[Ident\]* `]`
|
||
*Def* **->** *\[Name\]* `:` *Exp*
|
||
**|** *\[Name\]* `=` *Exp*
|
||
**|** *Name* *\[Patt\]* `=` *Exp*
|
||
**|** *\[Name\]* `:` *Exp* `=` *Exp*
|
||
*TopDef* **->** `cat` *\[CatDef\]*
|
||
**|** `fun` *\[FunDef\]*
|
||
**|** `data` *\[FunDef\]*
|
||
**|** `def` *\[Def\]*
|
||
**|** `data` *\[DataDef\]*
|
||
**|** `param` *\[ParDef\]*
|
||
**|** `oper` *\[Def\]*
|
||
**|** `lincat` *\[PrintDef\]*
|
||
**|** `lindef` *\[Def\]*
|
||
**|** `linref` *\[Def\]*
|
||
**|** `lin` *\[Def\]*
|
||
**|** `printname` `cat` *\[PrintDef\]*
|
||
**|** `printname` `fun` *\[PrintDef\]*
|
||
**|** `flags` *\[FlagDef\]*
|
||
*CatDef* **->** *Ident* *\[DDecl\]*
|
||
**|** `[` *Ident* *\[DDecl\]* `]`
|
||
**|** `[` *Ident* *\[DDecl\]* `]` `{` *Integer* `}`
|
||
*FunDef* **->** *\[Ident\]* `:` *Exp*
|
||
*DataDef* **->** *Ident* `=` *\[DataConstr\]*
|
||
*DataConstr* **->** *Ident*
|
||
**|** *Ident* `.` *Ident*
|
||
*\[DataConstr\]* **->** **eps**
|
||
**|** *DataConstr*
|
||
**|** *DataConstr* `|` *\[DataConstr\]*
|
||
*ParDef* **->** *Ident* `=` *\[ParConstr\]*
|
||
**|** *Ident* `=` `(` `in` *Ident* `)`
|
||
**|** *Ident*
|
||
*ParConstr* **->** *Ident* *\[DDecl\]*
|
||
*PrintDef* **->** *\[Name\]* `=` *Exp*
|
||
*FlagDef* **->** *Ident* `=` *Ident*
|
||
*\[Def\]* **->** *Def* `;`
|
||
**|** *Def* `;` *\[Def\]*
|
||
*\[CatDef\]* **->** *CatDef* `;`
|
||
**|** *CatDef* `;` *\[CatDef\]*
|
||
*\[FunDef\]* **->** *FunDef* `;`
|
||
**|** *FunDef* `;` *\[FunDef\]*
|
||
*\[DataDef\]* **->** *DataDef* `;`
|
||
**|** *DataDef* `;` *\[DataDef\]*
|
||
*\[ParDef\]* **->** *ParDef* `;`
|
||
**|** *ParDef* `;` *\[ParDef\]*
|
||
*\[PrintDef\]* **->** *PrintDef* `;`
|
||
**|** *PrintDef* `;` *\[PrintDef\]*
|
||
*\[FlagDef\]* **->** *FlagDef* `;`
|
||
**|** *FlagDef* `;` *\[FlagDef\]*
|
||
*\[ParConstr\]* **->** **eps**
|
||
**|** *ParConstr*
|
||
**|** *ParConstr* `|` *\[ParConstr\]*
|
||
*\[Ident\]* **->** *Ident*
|
||
**|** *Ident* `,` *\[Ident\]*
|
||
*Name* **->** *Ident*
|
||
**|** `[` *Ident* `]`
|
||
*\[Name\]* **->** *Name*
|
||
**|** *Name* `,` *\[Name\]*
|
||
*LocDef* **->** *\[Ident\]* `:` *Exp*
|
||
**|** *\[Ident\]* `=` *Exp*
|
||
**|** *\[Ident\]* `:` *Exp* `=` *Exp*
|
||
*\[LocDef\]* **->** **eps**
|
||
**|** *LocDef*
|
||
**|** *LocDef* `;` *\[LocDef\]*
|
||
*Exp6* **->** *Ident*
|
||
**|** *Sort*
|
||
**|** *String*
|
||
**|** *Integer*
|
||
**|** *Double*
|
||
**|** `?`
|
||
**|** `[` `]`
|
||
**|** `data`
|
||
**|** `[` *Ident* *Exps* `]`
|
||
**|** `[` *String* `]`
|
||
**|** `{` *\[LocDef\]* `}`
|
||
**|** `<` *\[TupleComp\]* `>`
|
||
**|** `<` *Exp* `:` *Exp* `>`
|
||
**|** `(` *Exp* `)`
|
||
*Exp5* **->** *Exp5* `.` *Label*
|
||
**|** *Exp6*
|
||
*Exp4* **->** *Exp4* *Exp5*
|
||
**|** `table` `{` *\[Case\]* `}`
|
||
**|** `table` *Exp6* `{` *\[Case\]* `}`
|
||
**|** `table` *Exp6* `[` *\[Exp\]* `]`
|
||
**|** `case` *Exp* `of` `{` *\[Case\]* `}`
|
||
**|** `variants` `{` *\[Exp\]* `}`
|
||
**|** `pre` `{` *Exp* `;` *\[Altern\]* `}`
|
||
**|** `strs` `{` *\[Exp\]* `}`
|
||
**|** *Ident* `@` *Exp6*
|
||
**|** *Exp5*
|
||
*Exp3* **->** *Exp3* `!` *Exp4*
|
||
**|** *Exp3* `*` *Exp4*
|
||
**|** *Exp3* `**` *Exp4*
|
||
**|** *Exp4*
|
||
*Exp1* **->** *Exp2* `+` *Exp1*
|
||
**|** *Exp2*
|
||
*Exp* **->** *Exp1* `++` *Exp*
|
||
**|** `\` *\[Bind\]* `->` *Exp*
|
||
**|** `\` `\` *\[Bind\]* `=>` *Exp*
|
||
**|** *Decl* `->` *Exp*
|
||
**|** *Exp3* `=>` *Exp*
|
||
**|** `let` `{` *\[LocDef\]* `}` `in` *Exp*
|
||
**|** `let` *\[LocDef\]* `in` *Exp*
|
||
**|** *Exp3* `where` `{` *\[LocDef\]* `}`
|
||
**|** `in` *Exp5* *String*
|
||
**|** *Exp1*
|
||
*Exp2* **->** *Exp3*
|
||
*\[Exp\]* **->** **eps**
|
||
**|** *Exp*
|
||
**|** *Exp* `;` *\[Exp\]*
|
||
*Exps* **->** **eps**
|
||
**|** *Exp6* *Exps*
|
||
*Patt2* **->** `_`
|
||
**|** *Ident*
|
||
**|** *Ident* `.` *Ident*
|
||
**|** *Integer*
|
||
**|** *Double*
|
||
**|** *String*
|
||
**|** `{` *\[PattAss\]* `}`
|
||
**|** `<` *\[PattTupleComp\]* `>`
|
||
**|** `(` *Patt* `)`
|
||
*Patt1* **->** *Ident* *\[Patt\]*
|
||
**|** *Ident* `.` *Ident* *\[Patt\]*
|
||
**|** *Patt2* `*`
|
||
**|** *Ident* `@` *Patt2*
|
||
**|** `-` *Patt2*
|
||
**|** *Patt2*
|
||
*Patt* **->** *Patt* `|` *Patt1*
|
||
**|** *Patt* `+` *Patt1*
|
||
**|** *Patt1*
|
||
*PattAss* **->** *\[Ident\]* `=` *Patt*
|
||
*Label* **->** *Ident*
|
||
**|** `$` *Integer*
|
||
*Sort* **->** `Type`
|
||
**|** `PType`
|
||
**|** `Str`
|
||
**|** `Strs`
|
||
*\[PattAss\]* **->** **eps**
|
||
**|** *PattAss*
|
||
**|** *PattAss* `;` *\[PattAss\]*
|
||
*\[Patt\]* **->** *Patt2*
|
||
**|** *Patt2* *\[Patt\]*
|
||
*Bind* **->** *Ident*
|
||
**|** `_`
|
||
*\[Bind\]* **->** **eps**
|
||
**|** *Bind*
|
||
**|** *Bind* `,` *\[Bind\]*
|
||
*Decl* **->** `(` *\[Bind\]* `:` *Exp* `)`
|
||
**|** *Exp4*
|
||
*TupleComp* **->** *Exp*
|
||
*PattTupleComp* **->** *Patt*
|
||
*\[TupleComp\]* **->** **eps**
|
||
**|** *TupleComp*
|
||
**|** *TupleComp* `,` *\[TupleComp\]*
|
||
*\[PattTupleComp\]* **->** **eps**
|
||
**|** *PattTupleComp*
|
||
**|** *PattTupleComp* `,` *\[PattTupleComp\]*
|
||
*Case* **->** *Patt* `=>` *Exp*
|
||
*\[Case\]* **->** *Case*
|
||
**|** *Case* `;` *\[Case\]*
|
||
*Altern* **->** *Exp* `/` *Exp*
|
||
*\[Altern\]* **->** **eps**
|
||
**|** *Altern*
|
||
**|** *Altern* `;` *\[Altern\]*
|
||
*DDecl* **->** `(` *\[Bind\]* `:` *Exp* `)`
|
||
**|** *Exp6*
|
||
*\[DDecl\]* **->** **eps**
|
||
**|** *DDecl* *\[DDecl\]*
|
||
--------------------- -------- ------------------------------------------------------------------------------------
|