forked from GitHub/gf-core
1242 lines
35 KiB
HTML
1242 lines
35 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
|
|
<html><head><title></title></head>
|
|
<body bgcolor="#ffffff" text="#000000">
|
|
<center>
|
|
|
|
<img src="../gf-logo.gif">
|
|
|
|
<h1>Grammatical Framework Tutorial</h1>
|
|
|
|
<p>
|
|
|
|
<b>3rd Edition, for GF version 2.2 or later</b>
|
|
|
|
<p>
|
|
|
|
<a href="http://www.cs.chalmers.se/~aarne">Aarne Ranta</a>
|
|
|
|
<p>
|
|
|
|
<tt>aarne@cs.chalmers.se</tt>
|
|
|
|
<p>
|
|
|
|
17 May 2005
|
|
</center>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>GF = Grammatical Framework</h2>
|
|
|
|
The term GF is used for different things:
|
|
<ul>
|
|
<li> a <b>program</b> used for working with grammars
|
|
<li> a <b>programming language</b> in which grammars can be written
|
|
<li> a <b>theory</b> about grammars and languages
|
|
</ul>
|
|
|
|
<p>
|
|
|
|
This tutorial is primarily about the GF program and
|
|
the GF programming language.
|
|
It will guide you
|
|
<ul>
|
|
<li> to use the GF program
|
|
<li> to write GF grammars
|
|
<li> to write programs in which GF grammars are used as components
|
|
</ul>
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h3>Getting the GF program</h3>
|
|
|
|
The program is open-source free software, which you can download from the
|
|
GF Homepage:<br>
|
|
<a href="http://www.cs.chalmers.se/%7Eaarne/GF">
|
|
<tt>http://www.cs.chalmers.se/~aarne/GF</tt></a>
|
|
|
|
<p>
|
|
|
|
There you can download
|
|
<ul>
|
|
<li> ready-made binaries for Linux, Solaris, Macintosh, and Windows
|
|
<li> source code and documentation
|
|
<li> grammar libraries and examples
|
|
</ul>
|
|
If you want to compile GF from source, you need Haskell and Java
|
|
compilers. But normally you don't have to compile, and you definitely
|
|
don't need to know Haskell or Java to use GF.
|
|
|
|
<p>
|
|
|
|
To start the GF program, assuming you have installed it, just type
|
|
<pre>
|
|
gf
|
|
</pre>
|
|
in the shell. You will see GF's welcome message and the prompt <tt>></tt>.
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>My first grammar</h2>
|
|
|
|
Now you are ready to try out your first grammar.
|
|
We start with one that is not written in GF language, but
|
|
in the EBNF notation (Extended Backus Naur Form), which GF can also
|
|
understand. Type (or copy) the following lines in a file named
|
|
<tt>paleolithic.ebnf</tt>:
|
|
<pre>
|
|
S ::= NP VP ;
|
|
VP ::= V | TV NP | "is" A ;
|
|
NP ::= ("this" | "that" | "the" | "a") CN ;
|
|
CN ::= A CN ;
|
|
CN ::= "boy" | "louse" | "snake" | "worm" ;
|
|
A ::= "green" | "rotten" | "thick" | "warm" ;
|
|
V ::= "laughs" | "sleeps" | "swims" ;
|
|
TV ::= "eats" | "kills" | "washes" ;
|
|
</pre>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h3>Importing grammars and parsing strings</h3>
|
|
|
|
The first GF command when using a grammar is to <b>import</b> it.
|
|
The command has a long name, <tt>import</tt>, and a short name, <tt>i</tt>.
|
|
<pre>
|
|
import paleolithic.gf
|
|
</pre>
|
|
The GF program now <b>compiles</b> your grammar into an internal
|
|
representation, and shows a new prompt when it is ready.
|
|
|
|
<p>
|
|
|
|
You can use GF for <b>parsing</b>:
|
|
<pre>
|
|
> parse "the boy eats a snake"
|
|
Mks_0 (Mks_6 Mks_9) (Mks_2 Mks_20 (Mks_7 Mks_11))
|
|
|
|
> parse "the snake eats a boy"
|
|
Mks_0 (Mks_6 Mks_11) (Mks_2 Mks_20 (Mks_7 Mks_9))
|
|
</pre>
|
|
The <tt>parse</tt> (= <tt>p</tt>) command takes a <b>string</b>
|
|
(in double quotes) and returns an <b>abstract syntax tree</b> - the thing
|
|
with <tt>Mks</tt>s and parentheses. We will see soon how to make sense
|
|
of the abstract syntax trees - now you should just notice that the tree
|
|
is different for the two strings.
|
|
|
|
<p>
|
|
|
|
Strings that return a tree when parsed do so in virtue of the grammar
|
|
you imported. Try parsing something else, and you fail
|
|
<pre>
|
|
> p "hello world"
|
|
No success in cf parsing
|
|
no tree found
|
|
</pre>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h3>Generating trees and strings</h3>
|
|
|
|
You can also use GF for <b>linearizing</b>
|
|
(<tt>linearize = l</tt>). This is the inverse of
|
|
parsing, taking trees into strings:
|
|
<pre>
|
|
> linearize Mks_0 (Mks_6 Mks_11) (Mks_2 Mks_20 (Mks_7 Mks_9))
|
|
the snake eats a boy
|
|
</pre>
|
|
What is the use of this? Typically not that you type in a tree at
|
|
the GF prompt. The utility of linearization comes from the fact that
|
|
you can obtain a tree from somewhere else. One way to do so is
|
|
<b>random generation</b> (<tt>generate_random = gr</tt>):
|
|
<pre>
|
|
> generate_random
|
|
Mks_0 (Mks_4 Mks_11) (Mks_3 Mks_15)
|
|
</pre>
|
|
Now you can copy the tree and paste it to the <tt>linearize command</tt>.
|
|
Or, more efficiently, feed random generation into parsing by using
|
|
a <b>pipe</b>.
|
|
<pre>
|
|
> gr | l
|
|
this worm is warm
|
|
</pre>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h3>Some random-generated sentences</h3>
|
|
|
|
Random generation can be quite amusing. So you may want to
|
|
generate ten strings with one and the same command:
|
|
<pre>
|
|
> gr -number=10 | l
|
|
this boy is green
|
|
a snake laughs
|
|
the rotten boy is thick
|
|
a boy washes this worm
|
|
a boy is warm
|
|
this green warm boy is rotten
|
|
the green thick green louse is rotten
|
|
that boy is green
|
|
this thick thick boy laughs
|
|
a boy is green
|
|
</pre>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h3>Systematic generation</h3>
|
|
|
|
To generate <i>all</i> sentence that a grammar
|
|
can generate, use the command <tt>generate_trees = gt</tt>.
|
|
<pre>
|
|
> generate_trees | l
|
|
this louse laughs
|
|
this louse sleeps
|
|
this louse swims
|
|
this louse is green
|
|
this louse is rotten
|
|
...
|
|
a boy is rotten
|
|
a boy is thick
|
|
a boy is warm
|
|
</pre>
|
|
You get quite a few trees but not all of them: only up to a given
|
|
<b>depth</b> of trees. To see how you can get more, use the
|
|
<tt>help = h</tt> command,
|
|
<pre>
|
|
help gr
|
|
</pre>
|
|
<b>Quiz</b>. If the command <tt>gt</tt> generated all
|
|
trees in your grammar, it would never terminate. Why?
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h3>More on pipes; tracing</h3>
|
|
|
|
A pipe of GF commands can have any length, but the "output type"
|
|
(either string or tree) of one command must always match the "input type"
|
|
of the next command.
|
|
|
|
<p>
|
|
|
|
The intermediate results in a pipe can be observed by putting the
|
|
<b>tracing</b> flag <tt>-tr</tt> to each command whose output you
|
|
want to see:
|
|
<pre>
|
|
> gr -tr | l -tr | p
|
|
Mks_0 (Mks_7 Mks_10) (Mks_1 Mks_18)
|
|
a louse sleeps
|
|
Mks_0 (Mks_7 Mks_10) (Mks_1 Mks_18)
|
|
</pre>
|
|
This facility is good for test purposes: for instance, you
|
|
may want to see if a grammar is <b>ambiguous</b>, i.e.
|
|
contains strings that can be parsed in more than one way.
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h3>Writing and reading files</h3>
|
|
|
|
To save the outputs of GF commands into a file, you can
|
|
pipe it to the <tt>write_file = wf</tt> command,
|
|
<pre>
|
|
> gr -number=10 | l | write_file exx.tmp
|
|
</pre>
|
|
You can read the file back to GF with the
|
|
<tt>read_file = rf</tt> command,
|
|
<pre>
|
|
> read_file exx.tmp | l -tr | p -lines
|
|
</pre>
|
|
Notice the flag <tt>-lines</tt> given to the parsing
|
|
command. This flag tells GF to parse each line of
|
|
the file separately. Without the flag, the grammar could
|
|
not recognize the string in the file, because it is not
|
|
a sentence but a sequence of ten sentences.
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h3>Labelled context-free grammars</h3>
|
|
|
|
The syntax trees returned by GF's parser in the previous examples
|
|
are not so nice to look at. The identifiers of form <tt>Mks</tt>
|
|
are <b>labels</b> of the EBNF rules. To see which label corresponds to
|
|
which rule, you can use the <tt>print_grammar = pg</tt> command
|
|
with the <tt>printer</tt> flag set to <tt>cf</tt> (which means context-free):
|
|
<pre>
|
|
> print_grammar -printer=cf
|
|
Mks_10. CN ::= "louse" ;
|
|
Mks_11. CN ::= "snake" ;
|
|
Mks_12. CN ::= "worm" ;
|
|
Mks_8. CN ::= A CN ;
|
|
Mks_9. CN ::= "boy" ;
|
|
Mks_4. NP ::= "this" CN ;
|
|
Mks_15. A ::= "thick" ;
|
|
...
|
|
</pre>
|
|
A syntax tree such as
|
|
<pre>
|
|
Mks_4 (Mks_8 Mks_15 Mks_12)
|
|
this thick worm
|
|
</pre>
|
|
encodes the sequence of grammar rules used for building the
|
|
expression. If you look at this tree, you will notice that <tt>Mks_4</tt>
|
|
is the label of the rule prefixing <tt>this</tt> to a common noun,
|
|
<tt>Mks_15</tt> is the label of the adjective <tt>thick</tt>,
|
|
and so on.
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>The labelled context-free format</h4>
|
|
|
|
The <b>labelled context-free grammar</b> format permits user-defined
|
|
labels to each rule. GF recognizes files of this format by the suffix
|
|
<tt>.cf</tt>. Let us include the following rules in the file
|
|
<tt>paleolithic.cf</tt>.
|
|
<pre>
|
|
PredVP. S ::= NP VP ;
|
|
UseV. VP ::= V ;
|
|
ComplTV. VP ::= TV NP ;
|
|
UseA. VP ::= "is" A ;
|
|
This. NP ::= "this" CN ;
|
|
That. NP ::= "that" CN ;
|
|
Def. NP ::= "the" CN ;
|
|
Indef. NP ::= "a" CN ;
|
|
ModA. CN ::= A CN ;
|
|
Boy. CN ::= "boy" ;
|
|
Louse. CN ::= "louse" ;
|
|
Snake. CN ::= "snake" ;
|
|
Worm. CN ::= "worm" ;
|
|
Green. A ::= "green" ;
|
|
Rotten. A ::= "rotten" ;
|
|
Thick. A ::= "thick" ;
|
|
Warm. A ::= "warm" ;
|
|
Laugh. V ::= "laughs" ;
|
|
Sleep. V ::= "sleeps" ;
|
|
Swim. V ::= "swims" ;
|
|
Eat. TV ::= "eats" ;
|
|
Kill. TV ::= "kills"
|
|
Wash. TV ::= "washes" ;
|
|
</pre>
|
|
|
|
<!-- NEW -->
|
|
<h4>Using the labelled context-free format</h4>
|
|
|
|
The GF commands for the <tt>.cf</tt> format are
|
|
exactly the same as for the <tt>.ebnf</tt> format.
|
|
Just the syntax trees become nicer to read and
|
|
to remember. Notice that before reading in
|
|
a new grammar in GF you often (but not always,
|
|
as we will see later) have first to give the
|
|
command (<tt>empty = e</tt>), which removes the
|
|
old grammar from the GF shell state.
|
|
<pre>
|
|
> empty
|
|
|
|
> i paleolithic.cf
|
|
|
|
> p "the boy eats a snake"
|
|
PredVP (Def Boy) (ComplTV Eat (Indef Snake))
|
|
|
|
> gr -tr | l
|
|
PredVP (Indef Louse) (UseA Thick)
|
|
a louse is thick
|
|
</pre>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>The GF grammar format</h2>
|
|
|
|
To see what there really is in GF's shell state when a grammar
|
|
has been imported, you can give the plain command
|
|
<tt>print_grammar = pg</tt>.
|
|
<pre>
|
|
> print_grammar
|
|
</pre>
|
|
The output is quite unreadable at this stage, and you may feel happy that
|
|
you did not need to write the grammar in that notation, but that the
|
|
GF grammar compiler produced it.
|
|
|
|
<p>
|
|
|
|
However, we will now start to show how GF's own notation gives you
|
|
much more expressive power than the <tt>.cf</tt> and <tt>.ebnf</tt>
|
|
formats. We will introduce the <tt>.gf</tt> format by presenting
|
|
one more way of defining the same grammar as in
|
|
<tt>paleolithic.cf</tt> and <tt>paleolithic.ebnf</tt>.
|
|
Then we will show how the full GF grammar format enables you
|
|
to do things that are not possible in the weaker formats.
|
|
|
|
|
|
<!-- NEW -->
|
|
<h3>Abstract and concrete syntax</h3>
|
|
|
|
A GF grammar consists of two main parts:
|
|
<ul>
|
|
<li> <b>abstract syntax</b>, defining what syntax trees there are
|
|
<li> <b>concrete syntax</b>, defining how trees are linearized into strings
|
|
</ul>
|
|
The EBNF and CF formats fuse these two things together, but it is possible
|
|
to take them apart. For instance, the verb phrase predication rule
|
|
<pre>
|
|
PredVP. S ::= NP VP ;
|
|
</pre>
|
|
is interpreted as the following pair of rules:
|
|
<pre>
|
|
fun PredVP : NP -> VP -> S ;
|
|
lin PredVP x y = {s = x.s ++ y.s} ;
|
|
</pre>
|
|
The former rule, with the keyword <tt>fun</tt>, belongs to the abstract syntax.
|
|
It defines the <b>function</b>
|
|
<tt>PredVP</tt> which constructs syntax trees of form
|
|
(<tt>PredVP</tt> <i>x</i> <i>y</i>).
|
|
|
|
<p>
|
|
|
|
The latter rule, with the keyword <tt>lin</tt>, belongs to the concrete syntax.
|
|
It defines the <b>linearization function</b> for
|
|
syntax trees of form (<tt>PredVP</tt> <i>x</i> <i>y</i>).
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>Judgement forms</h4>
|
|
|
|
Rules in a GF grammar are called <b>judgements</b>, and the keywords
|
|
<tt>fun</tt> and <tt>lin</tt> are used for distinguishing between two
|
|
<b>judgement forms</b>. Here is a summary of the most important
|
|
judgement forms:
|
|
<ul>
|
|
<li> abstract syntax
|
|
<ul>
|
|
<li> cat C
|
|
<li> fun f : A
|
|
</ul>
|
|
<li> concrete syntax
|
|
<ul>
|
|
<li> lincat C = T
|
|
<li> lin f x ... y = t
|
|
</ul>
|
|
</ul>
|
|
We return to the precise meanings of these judgement forms later.
|
|
First we will look at how judgements are grouped into modules, and
|
|
show how the grammar <tt>paleolithic.cf</tt> is
|
|
expressed by using modules and judgements.
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>Module types</h4>
|
|
|
|
A GF grammar consists of <b>modules</b>,
|
|
into which judgements are grouped. The most important
|
|
module forms are
|
|
<ul>
|
|
<li> <tt>abstract</tt> A = M</tt>, abstract syntax A with judgements in
|
|
the module body M.
|
|
<li> <tt>concrete</tt> C <tt>of</tt> A = M</tt>, concrete syntax C of the
|
|
abstract syntax A, with judgements in the module body M.
|
|
</ul>
|
|
|
|
<!-- NEW -->
|
|
<h4>An abstract syntax example</h4>
|
|
|
|
Each nonterminal occurring in <tt>paleolithic.cf</tt> is
|
|
introduced by a <tt>cat</tt> judgement. Each
|
|
rule label is introduced by a <tt>fun</tt> judgement.
|
|
<pre>
|
|
abstract Paleolithic = {
|
|
cat
|
|
S ; NP ; VP ; CN ; A ; V ; TV ;
|
|
fun
|
|
PredVP : NP -> VP -> S ;
|
|
UseV : V -> VP ;
|
|
ComplTV : TV -> NP -> VP ;
|
|
UseA : A -> VP ;
|
|
ModA : A -> CN -> CN ;
|
|
This, That, Def, Indef : CN -> NP ;
|
|
Boy, Louse, Snake, Worm : CN ;
|
|
Green, Rotten, Thick, Warm : A ;
|
|
Laugh, Sleep, Swim : V ;
|
|
Eat, Kill, Wash : TV ;
|
|
}
|
|
</pre>
|
|
Notice the use of shorthands permitting the sharing of
|
|
the keyword in subsequent judgements, and of the type
|
|
in subsequent <tt>fun</tt> judgements.
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>A concrete syntax example</h4>
|
|
|
|
Each category introduced in <tt>Paleolithic.gf</tt> is
|
|
given a <tt>lincat</tt> rule, and each
|
|
function is given a <tt>fun</tt> rule. Similar shorthands
|
|
apply as in <tt>abstract</tt> modules.
|
|
<pre>
|
|
concrete PaleolithicEng of Paleolithic = {
|
|
lincat
|
|
S, NP, VP, CN, A, V, TV = {s : Str} ;
|
|
lin
|
|
PredVP np vp = {s = np.s ++ vp.s} ;
|
|
UseV v = v ;
|
|
ComplTV tv np = {s = tv.s ++ np.s} ;
|
|
UseA a = {s = "is" ++ a.s} ;
|
|
This cn = {s = "this" ++ cn.s} ;
|
|
That cn = {s = "that" ++ cn.s} ;
|
|
Def cn = {s = "the" ++ cn.s} ;
|
|
Indef cn = {s = "a" ++ cn.s} ;
|
|
ModA a cn = {s = a.s ++ cn.s} ;
|
|
Boy = {s = "boy"} ;
|
|
Louse = {s = "louse"} ;
|
|
Snake = {s = "snake"} ;
|
|
Worm = {s = "worm"} ;
|
|
Green = {s = "green"} ;
|
|
Rotten = {s = "rotten"} ;
|
|
Thick = {s = "thick"} ;
|
|
Warm = {s = "warm"} ;
|
|
Laugh = {s = "laughs"} ;
|
|
Sleep = {s = "sleeps"} ;
|
|
Swim = {s = "swims"} ;
|
|
Eat = {s = "eats"} ;
|
|
Kill = {s = "kills"} ;
|
|
Wash = {s = "washes"} ;
|
|
}
|
|
</pre>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>Modules and files</h4>
|
|
|
|
Module name + <tt>.gf</tt> = file name
|
|
|
|
<p>
|
|
|
|
Each module is compiled into a <tt>.gfc</tt> file.
|
|
|
|
<p>
|
|
|
|
Import <tt>PaleolithicEng.gf</tt> and try what happens
|
|
<pre>
|
|
> i PaleolithicEng.gf
|
|
</pre>
|
|
The GF program does not only read the file
|
|
<tt>PaleolithicEng.gf</tt>, but also all other files that it
|
|
depends on - in this case, <tt>Paleolithic.gf</tt>.
|
|
|
|
<p>
|
|
|
|
For each file that is compiles, a <tt>.gfc</tt> file
|
|
is generated. The GFC format (="GF Canonical") is the
|
|
"machine code" of GF, which is faster to process than
|
|
GF source files. When reading a module, GF knows whether
|
|
to use an existing <tt>.gfc</tt> file or to generate
|
|
a new one, by looking at modification times.
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>Multilingual grammar</h4>
|
|
|
|
The main advantage of separating abstract from concrete syntax is that
|
|
one abstract syntax can be equipped with many concrete syntaxes.
|
|
A system with this property is called a <b>multilingual grammar</b>.
|
|
|
|
<p>
|
|
|
|
Multilingual grammars can be used for applications such as
|
|
translation. Let us buid an Italian concrete syntax for
|
|
<tt>Paleolithic</tt> and then test the resulting
|
|
multilingual grammar.
|
|
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>An Italian concrete syntax</h4>
|
|
|
|
<pre>
|
|
concrete PaleolithicIta of Paleolithic = {
|
|
lincat
|
|
S, NP, VP, CN, A, V, TV = {s : Str} ;
|
|
lin
|
|
PredVP np vp = {s = np.s ++ vp.s} ;
|
|
UseV v = v ;
|
|
ComplTV tv np = {s = tv.s ++ np.s} ;
|
|
UseA a = {s = "è" ++ a.s} ;
|
|
This cn = {s = "questo" ++ cn.s} ;
|
|
That cn = {s = "quello" ++ cn.s} ;
|
|
Def cn = {s = "il" ++ cn.s} ;
|
|
Indef cn = {s = "un" ++ cn.s} ;
|
|
ModA a cn = {s = cn.s ++ a.s} ;
|
|
Boy = {s = "ragazzo"} ;
|
|
Louse = {s = "pidocchio"} ;
|
|
Snake = {s = "serpente"} ;
|
|
Worm = {s = "verme"} ;
|
|
Green = {s = "verde"} ;
|
|
Rotten = {s = "marcio"} ;
|
|
Thick = {s = "grosso"} ;
|
|
Warm = {s = "caldo"} ;
|
|
Laugh = {s = "ride"} ;
|
|
Sleep = {s = "dorme"} ;
|
|
Swim = {s = "nuota"} ;
|
|
Eat = {s = "mangia"} ;
|
|
Kill = {s = "uccide"} ;
|
|
Wash = {s = "lava"} ;
|
|
}
|
|
</pre>
|
|
|
|
<!-- NEW -->
|
|
<h4>Using a multilingual grammar</h4>
|
|
|
|
Import without first emptying
|
|
<pre>
|
|
> i PaleolithicEng.gf
|
|
> i PaleolithicIta.gf
|
|
</pre>
|
|
Try generation now:
|
|
<pre>
|
|
> gr | l
|
|
un pidocchio uccide questo ragazzo
|
|
|
|
> gr | l -lang=PaleolithicEng
|
|
that louse eats a louse
|
|
</pre>
|
|
Translate by using a pipe:
|
|
<pre>
|
|
> p -lang=PaleolithicEng "the boy eats the snake" | l -lang=PaleolithicIta
|
|
il ragazzo mangia il serpente
|
|
</pre>
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>Translation quiz</h4>
|
|
|
|
This is a simple kind of language exercises that can be automatically
|
|
generated from a multilingual grammar. The system generates a set of
|
|
random sentence, displays them in one language, and checks the user's
|
|
answer given in another language. The command <tt>translation_quiz = tq</tt>
|
|
makes this in a subshell of GF.
|
|
<pre>
|
|
> translation_quiz PaleolithicEng PaleolithicIta
|
|
|
|
Welcome to GF Translation Quiz.
|
|
The quiz is over when you have done at least 10 examples
|
|
with at least 75 % success.
|
|
You can interrupt the quiz by entering a line consisting of a dot ('.').
|
|
|
|
a green boy washes the louse
|
|
un ragazzo verde lava il gatto
|
|
|
|
No, not un ragazzo verde lava il gatto, but
|
|
un ragazzo verde lava il pidocchio
|
|
Score 0/1
|
|
</pre>
|
|
You can also generate a list of translation exercises and save it in a
|
|
file for later use, by the command <tt>translation_list = tl</tt>
|
|
<pre>
|
|
> translation_list -number=25 PaleolithicEng PaleolithicIta
|
|
</pre>
|
|
The number flag gives the number of sentences generated.
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>The multilingual shell state</h4>
|
|
|
|
A GF shell is at any time in a state, which
|
|
contains a multilingual grammar. One of the concrete
|
|
syntaxes is the "main" one, which means that parsing and linearization
|
|
are performed by using it. By default, the main concrete syntax is the
|
|
last-imported one. As we saw on previous slide, the <tt>lang</tt> flag
|
|
can be used to change the linearization and parsing grammar.
|
|
|
|
<p>
|
|
|
|
To see what the multilingual grammar is (as well as some other
|
|
things), you can use the command
|
|
<tt>print_options = po</tt>:
|
|
<pre>
|
|
> print_options
|
|
main abstract : Paleolithic
|
|
main concrete : PaleolithicIta
|
|
all concretes : PaleolithicIta PaleolithicEng
|
|
</pre>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>Extending a grammar</h4>
|
|
|
|
The module system of GF makes it possible to <b>extend</b> a
|
|
grammar in different ways. The syntax of extension is
|
|
shown by the following example.
|
|
<pre>
|
|
abstract Neolithic = Paleolithic ** {
|
|
fun
|
|
Fire, Wheel : CN ;
|
|
Think : V ;
|
|
}
|
|
</pre>
|
|
Parallel to the abstract syntax, extensions can
|
|
be built for concrete syntaxes:
|
|
<pre>
|
|
concrete NeolithicEng of Neolithic = PaleolithicEng ** {
|
|
lin
|
|
Fire = {s = "fire"} ;
|
|
Wheel = {s = "wheel"} ;
|
|
Think = {s = "thinks"} ;
|
|
}
|
|
</pre>
|
|
The effect of extension is that all of the contents of the extended
|
|
and extending module are put together.
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>Multiple inheritance</h4>
|
|
|
|
Specialized vocabularies can be represented as small grammars that
|
|
only do "one thing" each, e.g.
|
|
<pre>
|
|
abstract Fish = {
|
|
cat Fish ;
|
|
fun Salmon, Perch : Fish ;
|
|
}
|
|
|
|
abstract Mushrooms = {
|
|
cat Mushroom ;
|
|
fun Cep, Agaric : Mushroom ;
|
|
}
|
|
</pre>
|
|
They can afterwards be combined in bigger grammars by using
|
|
<b>multiple inheritance</b>, i.e. extension of several grammars at the
|
|
same time:
|
|
<pre>
|
|
abstract Gatherer = Paleolithic, Fish, Mushrooms ** {
|
|
fun
|
|
UseFish : Fish -> CN ;
|
|
UseMushroom : Mushroom -> CN ;
|
|
}
|
|
</pre>
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>Visualizing module structure</h4>
|
|
|
|
When you have created all the abstract syntaxes and
|
|
one set of concrete syntaxes needed for <tt>Gatherer</tt>,
|
|
your grammar consists of eight GF modules. To see how their
|
|
dependences look like, you can use the command
|
|
<tt>visualize_graph = vg</tt>,
|
|
<pre>
|
|
> visualize_graph
|
|
</pre>
|
|
and the graph will pop up in a separate window. It can also
|
|
be printed out into a file, e.g. a <tt>.gif</tt> file that
|
|
can be included in an HTML document
|
|
<pre>
|
|
> pm -printer=graph | wf Gatherer.dot
|
|
> ! dot -Tgif Gatherer.dot > Gatherer.gif
|
|
</pre>
|
|
The latter command is a Unix command, issued from GF by using the
|
|
shell escape symbol <tt>!</tt>. The resulting graph is shown in the next section.
|
|
|
|
<p>
|
|
|
|
The command <tt>print_multi = pm</tt> is used for printing the current multilingual
|
|
grammar in various formats, of which the format <tt>-printer=graph</tt> just
|
|
shows the module dependencies.
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>The module structure of <tt>GathererEng</tt></h4>
|
|
|
|
The graph uses
|
|
<ul>
|
|
<li> oval boxes for abstract modules
|
|
<li> square boxes for concrete modules
|
|
<li> black-headed arrows for inheritance
|
|
<li> white-headed arrows for the concrete-of-abstract relation
|
|
</ul>
|
|
|
|
<p>
|
|
|
|
<img src="Gatherer.gif">
|
|
|
|
|
|
<!-- NEW -->
|
|
<h3>Resource modules</h3>
|
|
|
|
Suppose we want to say, with the vocabulary included in
|
|
<tt>Paleolithic.gf</tt>, things like
|
|
<pre>
|
|
the boy eats two snakes
|
|
all boys sleep
|
|
</pre>
|
|
The new grammatical facility we need are the plural forms
|
|
of nouns and verbs (<i>boys, sleep</i>), as opposed to their
|
|
singular forms.
|
|
|
|
<p>
|
|
|
|
The introduction of plural forms requires two things:
|
|
<ul>
|
|
<li> to <b>inflect</b> nouns and verbs in singular and plural number
|
|
<li> to describe the <b>agreement</b> of the verb to subject: the
|
|
rule that the verb must have the same number as the subject
|
|
</ul>
|
|
Different languages have different rules of inflection and agreement.
|
|
For instance, Italian has also agreement in gender (masculine vs. feminine).
|
|
We want to be able to ignore such differences in the abstract
|
|
syntax.
|
|
|
|
<p>
|
|
|
|
To be able to do all this, we need a couple of new judgement forms,
|
|
a new module form, and a more powerful way of expressing linearization
|
|
rules.
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>Parameters and tables</h4>
|
|
|
|
We define the <b>parameter type</b> of number in Englisn by
|
|
using a new form of judgement:
|
|
<pre>
|
|
param Number = Sg | Pl ;
|
|
</pre>
|
|
To express that nouns in English have a linearization
|
|
depending on number, we replace the linearization type <tt>{s : Str}</tt>
|
|
with a type where the <tt>s</tt> field is a <b>table</b> depending on number:
|
|
<pre>
|
|
lincat CN = {s : Number => Str} ;
|
|
</pre>
|
|
The <b>table type</b> <tt>Number => Str</tt> is in many respects similar to
|
|
a function type (<tt>Number -> Str</tt>). The main restriction is that the
|
|
argument type of a table type must always be a parameter type. This means
|
|
that the argument-value pairs can be listed in a finite table. The following
|
|
example shows such a table:
|
|
<pre>
|
|
lin Boy = {s = table {
|
|
Sg => "boy" ;
|
|
Pl => "boys"
|
|
}
|
|
} ;
|
|
</pre>
|
|
The application of a table to a parameter is done by the <b>selection</b>
|
|
operator <tt>!</tt>. For instance,
|
|
<pre>
|
|
Boy.s ! Pl
|
|
</pre>
|
|
is a selection, whose value is <tt>"boys"</tt>.
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>Inflection tables, paradigms, and <tt>oper</tt> definitions</h4>
|
|
|
|
All English common nouns are inflected in number, most of them in the
|
|
same way: the plural form is formed from the singular form by adding the
|
|
ending <i>s</i>. This rule is an example of
|
|
a <b>paradigm</b> - a formula telling how the inflection
|
|
forms of a word are formed.
|
|
|
|
<p>
|
|
|
|
From GF point of view, a paradigm is a function that takes a <b>lemma</b> -
|
|
a string also known as a <b>dictionary form</b> - and returns an inflection
|
|
table of desired type. Paradigms are not functions in the sense of the
|
|
<tt>fun</tt> judgements of abstract syntax (which operate on trees and not
|
|
on strings). Thus we call them <b>operations</b> for the sake of clarity,
|
|
introduce one one form of judgement, with the keyword <tt>oper</tt>. As an
|
|
example, the following operation defines the regular noun paradigm of English:
|
|
<pre>
|
|
oper regNoun : Str -> {s : Number => Str} = \x -> {
|
|
s = table {
|
|
Sg => x ;
|
|
Pl => x + "s"
|
|
}
|
|
} ;
|
|
</pre>
|
|
Thus an <tt>oper</tt> judgement includes the name of the defined operation,
|
|
its type, and an expression defining it. As for the syntax of the defining
|
|
expression, notice the <b>lambda abstraction</b> form <tt>\x -> t</tt> of
|
|
the function, and the <b>glueing</b> operator <tt>+</tt> telling that
|
|
the string held in the variable <tt>x</tt> and the ending <tt>"s"</tt>
|
|
are written together to form one <b>token</b>.
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>The <tt>resource</tt> module type</h4>
|
|
|
|
Parameter and operator definitions do not belong to the abstract syntax.
|
|
They can be used when defining concrete syntax - but they are not
|
|
tied to a particular set of linearization rules.
|
|
The proper way to see them is as auxiliary concepts, as <b>resources</b>
|
|
usable in many concrete syntaxes.
|
|
|
|
<p>
|
|
|
|
The <tt>resource</tt> module type thus consists of
|
|
<tt>param</tt> and <tt>oper</tt> definitions. Here is an
|
|
example.
|
|
<pre>
|
|
resource MorphoEng = {
|
|
param
|
|
Number = Sg | Pl ;
|
|
oper
|
|
Noun : Type = {s : Number => Str} ;
|
|
regNoun : Str -> Noun = \x -> {
|
|
s = table {
|
|
Sg => x ;
|
|
Pl => x + "s"
|
|
}
|
|
} ;
|
|
}
|
|
</pre>
|
|
Resource modules can extend other resource modules, in the
|
|
same way as modules of other types can extend modules of the
|
|
same type.
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h3>Opening a <tt>resource</tt></h3>
|
|
|
|
Any number of <tt>resource</tt> modules can be
|
|
<b>opened</b> in a <tt>concrete</tt> syntax, which
|
|
makes the parameter and operation definitions contained
|
|
in the resource usable in the concrete syntax. Here is
|
|
an example, where the resource <tt>MorphoEng</tt> is
|
|
open in (the fragment of) a new version of <tt>PaleolithicEng</tt>.
|
|
<pre>
|
|
concrete PaleolithicEng of Paleolithic = open MorphoEng in {
|
|
lincat
|
|
CN = Noun ;
|
|
lin
|
|
Boy = regNoun "boy" ;
|
|
Snake = regNoun "snake" ;
|
|
Worm = regNoun "worm" ;
|
|
}
|
|
</pre>
|
|
Notice that, just like in abstract syntax, function application
|
|
is written by juxtaposition of the function and the argument.
|
|
|
|
<p>
|
|
|
|
Using operations defined in resource modules is clearly a concise
|
|
way of giving e.g. inflection tables and other repeated patterns
|
|
of expression. In addition, it enables a new kind of modularity
|
|
and division of labour in grammar writing: grammarians familiar with
|
|
the linguistic details of a language can put this knowledge
|
|
available through resource grammars, whose users only need
|
|
to pick the right operations and not to know their implementation
|
|
details.
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>Worst-case macros and data abstraction</h4>
|
|
|
|
Some English nouns, such as <tt>louse</tt>, are so irregular that
|
|
it makes little sense to see them as instances of a paradigm. Even
|
|
then, it is useful to perform <b>data abstraction</b> from the
|
|
definition of the type <tt>Noun</tt>, and introduce a constructor
|
|
operation, a <b>worst-case macro</b> for nouns:
|
|
<pre>
|
|
oper mkNoun : Str -> Str -> Noun = \x,y -> {
|
|
s = table {
|
|
Sg => x ;
|
|
Pl => y
|
|
}
|
|
} ;
|
|
</pre>
|
|
Thus we define
|
|
<pre>
|
|
lin Louse = mkNoun "louse" "lice" ;
|
|
</pre>
|
|
instead of writing the inflection table explicitly.
|
|
|
|
<p>
|
|
|
|
The grammar engineering advantage of worst-case macros is that
|
|
the author of the resource module may change the definitions of
|
|
<tt>Noun</tt> and <tt>mkNoun</tt>, and still retain the
|
|
interface (i.e. the system of type signatures) that makes it
|
|
correct to use these functions in concrete modules. In programming
|
|
terms, <tt>Noun</tt> is then treated as an <b>abstract datatype</b>.
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>A system of paradigms using <tt>Prelude</tt> operations</h4>
|
|
|
|
The regular noun paradigm <tt>regNoun</tt> can - and should - of course be defined
|
|
by the worst-case macro <tt>mkNoun</tt>. In addition, some more noun paradigms
|
|
could be defined, for instance,
|
|
<pre>
|
|
regNoun : Str -> Noun = \snake -> mkNoun snake (snake + "s") ;
|
|
sNoun : Str -> Noun = \kiss -> mkNoun kiss (kiss + "es") ;
|
|
</pre>
|
|
What about nouns like <i>fly</i>, with the plural <i>flies</i>? The already
|
|
available solution is to use the so-called "technical stem" <i>fl</i> as
|
|
argument, and define
|
|
<pre>
|
|
yNoun : Str -> Noun = \fl -> mkNoun (fl + "y") (fl + "ies") ;
|
|
</pre>
|
|
But this paradigm would be very unintuitive to use, because the "technical stem"
|
|
is not even an existing form of the word. A better solution is to use
|
|
the string operator <tt>init</tt>, which returns the initial segment (i.e.
|
|
all characters but the last) of a string:
|
|
<pre>
|
|
yNoun : Str -> Noun = \fly -> mkNoun fly (init fly + "ies") ;
|
|
</pre>
|
|
The operator <tt>init</tt> belongs to a set of operations in the
|
|
resource module <tt>Prelude</tt>, which therefore has to be
|
|
<tt>open</tt>ed so that <tt>init</tt> can be used.
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>An intelligent noun paradigm using <tt>case</tt> expressions</h4>
|
|
|
|
It may be hard for the user of a resource morphology to pick the right
|
|
inflection paradigm. A way to help this is to define a more intelligent
|
|
paradigms, which chooses the ending by first analysing the lemma.
|
|
The following variant for English regular nouns puts together all the
|
|
previously shown paradigms, and chooses one of them on the basis of
|
|
the final letter of the lemma.
|
|
<pre>
|
|
regNoun : Str -> Noun = \s -> case last s of {
|
|
"s" | "z" => mkNoun s (s + "es") ;
|
|
"y" => mkNoun s (init s + "ies") ;
|
|
_ => mkNoun s (s + "s")
|
|
} ;
|
|
</pre>
|
|
This definition displays many GF expression forms not shown befores;
|
|
these forms are explained in the following section.
|
|
|
|
<p>
|
|
|
|
The paradigms <tt>regNoun</tt> does not give the correct forms for
|
|
all nouns. For instance, <i>louse - lice</i> and
|
|
<i>fish - fish</i> must be given by using <tt>mkNoun</i>.
|
|
Also the word <i>boy</i> would be inflected incorrectly; to prevent
|
|
this, either use <tt>mkNoun</tt> or modify
|
|
<tt>regNoun</tt> so that the <tt>"y"</tt> case does not
|
|
apply if the second-last character is a vowel.
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>Pattern matching</h4>
|
|
|
|
Expressions of the <tt>table</tt> form are built from lists of
|
|
argument-value pairs. These pairs are called the <b>branches</b>
|
|
of the table. In addition to constants introduced in
|
|
<tt>param</tt> definitions, the left-hand side of a branch can more
|
|
generally be a <b>pattern</b>, and the computation of selection is
|
|
then performed by <b>pattern matching</b>:
|
|
<ul>
|
|
<li> a variable pattern (identifier other than constant parameter) matches anything
|
|
<li> the wild card <tt>_</tt> matches anything
|
|
<li> a string literal pattern, e.g. <tt>"s"</tt>, matches the same string
|
|
<li> a disjunctive pattern <tt>P | ... | Q</tt> matches anything that
|
|
one of the disjuncts matches
|
|
</ul>
|
|
Pattern matching is performed in the order in which the branches
|
|
appear in the table.
|
|
|
|
<p>
|
|
|
|
As syntactic sugar, one-branch tables can be written concisely,
|
|
<pre>
|
|
\\P,...,Q => t === table {P => ... table {Q => t} ...}
|
|
</pre>
|
|
Finally, the <tt>case</tt> expressions common in functional
|
|
programming languages are syntactic sugar for table selections:
|
|
<pre>
|
|
case e of {...} === table {...} ! e
|
|
</pre>
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>Morphological analysis and morphology quiz</h4>
|
|
|
|
Even though in GF morphology
|
|
is mostly seen as an auxiliary of syntax, a morphology once defined
|
|
can be used on its own right. The command <tt>morpho_analyse = ma</tt>
|
|
can be used to read a text and return for each word the analyses that
|
|
it has in the current concrete syntax.
|
|
<pre>
|
|
> rf bible.txt | morpho_analyse
|
|
</pre>
|
|
Similarly to translation exercises, morphological exercises can
|
|
be generated, by the command <tt>morpho_quiz = mq</tt>. Usually,
|
|
the category is set to be something else than <tt>S</tt>. For instance,
|
|
<pre>
|
|
> i lib/resource/french/VerbsFre.gf
|
|
> morpho_quiz -cat=V
|
|
|
|
Welcome to GF Morphology Quiz.
|
|
...
|
|
|
|
réapparaître : VFin VCondit Pl P2
|
|
réapparaitriez
|
|
> No, not réapparaitriez, but
|
|
réapparaîtriez
|
|
Score 0/1
|
|
</pre>
|
|
Finally, a list of morphological exercises and save it in a
|
|
file for later use, by the command <tt>translation_list = tl</tt>
|
|
<pre>
|
|
> translation_list -number=25 PaleolithicEng PaleolithicIta
|
|
</pre>
|
|
The number flag gives the number of sentences generated.
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>Parametric vs. inherent features, agreement</h4>
|
|
|
|
The rule of subject-verb agreement in English says that the verb
|
|
phrase must be inflected in the number of the subject. This
|
|
means that a noun phrase (functioning as a subject), in some sense
|
|
<i>has</i> a number, which it "sends" to the verb. The verb does not
|
|
have a number, but must be able to receive whatever number the
|
|
subject has. This distinction is nicely represented by the
|
|
different linearization types of noun phrases and verb phrases:
|
|
<pre>
|
|
lincat NP = {s : Str ; n : Number} ;
|
|
lincat VP = {s : Number => Str} ;
|
|
</pre>
|
|
We say that the number of <tt>NP</tt> is an <b>inherent feature</b>,
|
|
whereas the number of <tt>NP</tt> is <b>parametric</b>.
|
|
|
|
<p>
|
|
|
|
The agreement rule itself is expressed in the linearization rule of
|
|
the predication structure:
|
|
<pre>
|
|
lin PredVP np vp = {s = np.s ++ vp.s ! np.n} ;
|
|
</pre>
|
|
The following page will present a new version of
|
|
<tt>PaleolithingEng</tt>, assuming an abstract syntax
|
|
xextended with <tt>All</tt> and <tt>Two</tt>.
|
|
It also assumes that <tt>MorphoEng</tt> has a paradigm
|
|
<tt>regVerb</tt> for regular verbs (which need only be
|
|
regular only in the present tensse).
|
|
The reader is invited to inspect the way in which agreement works in
|
|
the formation of noun phrases and verb phrases.
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>English concrete syntax with parameters</h4>
|
|
|
|
<pre>
|
|
concrete PaleolithicEng of Paleolithic = open MorphoEng in {
|
|
lincat
|
|
S, A = {s : Str} ;
|
|
VP, CN, V, TV = {s : Number => Str} ;
|
|
NP = {s : Str ; n : Number} ;
|
|
lin
|
|
PredVP np vp = {s = np.s ++ vp.s ! np.n} ;
|
|
UseV v = v ;
|
|
ComplTV tv np = {s = \\n => tv.s ! n ++ np.s} ;
|
|
UseA a = {s = \\n => case n of {Sg => "is" ; Pl => "are"} ++ a.s} ;
|
|
This cn = {s = "this" ++ cn.s ! Sg } ;
|
|
Indef cn = {s = "a" ++ cn.s ! Sg} ;
|
|
All cn = {s = "all" ++ cn.s ! Pl} ;
|
|
Two cn = {s = "two" ++ cn.s ! Pl} ;
|
|
ModA a cn = {s = \\n => a.s ++ cn.s ! n} ;
|
|
Louse = mkNoun "louse" "lice" ;
|
|
Snake = regNoun "snake" ;
|
|
Green = {s = "green"} ;
|
|
Warm = {s = "warm"} ;
|
|
Laugh = regVerb "laugh" ;
|
|
Sleep = regVerb "sleep" ;
|
|
Kill = regVerb "kill" ;
|
|
}
|
|
</pre>
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>Hierarchic parameter types</h4>
|
|
|
|
The reader familiar with a functional programming language such as
|
|
<a href="www.haskell.org">Haskell</a> must have noticed the similarity
|
|
between parameter types in GF and algebraic datatypes (<tt>data</tt> definitions
|
|
in Haskell). The GF parameter types are actually a special case of algebraic
|
|
datatypes: the main restriction is that in GF, these types must be finite.
|
|
(This restriction makes it possible to invert linearization rules into
|
|
parsing methods.)
|
|
|
|
<p>
|
|
|
|
However, finite is not the same thing as enumerated. Even in GF, parameter
|
|
constructors can take arguments, provided these arguments are from other
|
|
parameter types (recursion is forbidden). Such parameter types impose a
|
|
hierarchic order among parameters. They are often useful to define
|
|
linguistically accurate parameter systems.
|
|
|
|
<p>
|
|
|
|
To give an example, Swedish adjectives
|
|
are inflected in number (singular or plural) and
|
|
gender (uter or neuter). These parameters would suggest 2*2=4 different
|
|
forms. However, the gender distinction is done only in the singular. Therefore,
|
|
it would be inaccurate to define adjective paradigms using the type
|
|
<tt>Gender => Number => Str</tt>. The following hierarchic definition
|
|
yields an accurate system of three adjectival forms.
|
|
<pre>
|
|
param AdjForm = ASg Gender | APl ;
|
|
param Gender = Uter | Neuter ;
|
|
</pre>
|
|
In pattern matching, a constructor can have patterns as arguments. For instance,
|
|
the adjectival paradigm in which the two singular forms are the same, can be defined
|
|
<pre>
|
|
oper plattAdj : Str -> AdjForm => Str = \x -> table {
|
|
ASg _ => x ;
|
|
APl => x + "a" ;
|
|
}
|
|
</pre>
|
|
|
|
<!-- NEW -->
|
|
<h2>Topics still to be written</h2>
|
|
|
|
|
|
Discontinuous constituents
|
|
|
|
<p>
|
|
|
|
Predefined types and operations
|
|
|
|
<p>
|
|
|
|
Lexers and unlexers
|
|
|
|
<p>
|
|
|
|
Grammars of formal languages
|
|
|
|
<p>
|
|
|
|
Resource grammars and their reuse
|
|
|
|
<p>
|
|
|
|
Embedded grammars in Haskell and Java
|
|
|
|
<p>
|
|
|
|
Dependent types, variable bindings, semantic definitions
|
|
|
|
<p>
|
|
|
|
Transfer rules
|
|
|
|
|
|
|
|
</body>
|
|
</html> |