mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-09 13:09:33 -06:00
584 lines
15 KiB
HTML
584 lines
15 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
|
|
<html><head><title></title></head>
|
|
<body bgcolor="#ffffff" text="#000000">
|
|
<center>
|
|
|
|
<img src="../gf-logo.gif">
|
|
|
|
<h1>Grammatical Framework Tutorial</h1>
|
|
|
|
<p>
|
|
|
|
<b>3rd Edition, for GF version 2.2 or later</b>
|
|
|
|
</p><p>
|
|
|
|
<a href="http://www.cs.chalmers.se/~aarne">Aarne Ranta</a>
|
|
|
|
</p>
|
|
<p>
|
|
|
|
<tt>aarne@cs.chalmers.se</tt>
|
|
</p></center>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>GF = Grammatical Framework</h2>
|
|
|
|
The term GF is used for different things:
|
|
<ul>
|
|
<li> a <b>program</b> used for working with grammars
|
|
<li> a <b>programming language</b> in which grammars can be written
|
|
<li> a <b>theory</b> about the concepts of grammars and languages
|
|
</ul>
|
|
|
|
<p>
|
|
|
|
This tutorial is about the GF program and the GF programming language.
|
|
It will guide you
|
|
<ul>
|
|
<li> to use the GF program
|
|
<li> to write GF grammars
|
|
<li> to write programs in which GF grammars are used as components
|
|
</ul>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h3>The GF program</h3>
|
|
|
|
The program is open-source free software, which you can download from the
|
|
GF Homepage:<br>
|
|
<a href="http://www.cs.chalmers.se/%7Eaarne/GF">
|
|
<tt>http://www.cs.chalmers.se/~aarne/GF</tt></a>
|
|
|
|
<p>
|
|
|
|
There you can download
|
|
<ul>
|
|
<li> ready-made binaries for Linux, Solaris, Macintosh, and Windows
|
|
<li> source code and documentation
|
|
<li> grammar libraries and examples
|
|
</ul>
|
|
If you want to compile GF from source, you need Haskell and Java
|
|
compilers. But normally you don't have to compile, and you don't
|
|
need to know Haskell or Java to use GF.
|
|
|
|
<p>
|
|
|
|
To start the GF program, assuming you have installed it, just type
|
|
<pre>
|
|
gf
|
|
</pre>
|
|
in the shell. You will see GF's welcome message and the prompt <tt>></tt>.
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>My first grammar</h2>
|
|
|
|
Now you are ready to try out your first grammar.
|
|
We start with one that is not written in GF language, but
|
|
in the EBNF notation (Extended Backus Naur Form), which GF can also
|
|
understand. Type (or copy) the following lines in a file named
|
|
<tt>paleolithic.ebnf</tt>:
|
|
<pre>
|
|
S ::= NP VP ;
|
|
VP ::= V | TV NP | "is" A ;
|
|
NP ::= ("this" | "that" | "the" | "a") CN ;
|
|
CN ::= A CN ;
|
|
CN ::= "boy" | "louse" | "snake" | "worm" ;
|
|
A ::= "green" | "rotten" | "thick" | "warm" ;
|
|
V ::= "laughs" | "sleeps" | "swims" ;
|
|
TV ::= "eats" | "kills" | "washes" ;
|
|
</pre>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h3>Importing grammars and parsing strings</h3>
|
|
|
|
The first GF command when using a grammar is to <b>import</b> it.
|
|
The command has a long name, <tt>import</tt>, and a short name, <tt>i</tt>.
|
|
<pre>
|
|
import paleolithic.gf
|
|
</pre>
|
|
The GF program now <b>compiles</b> your grammar into an internal
|
|
representation, and shows a new prompt when it is ready.
|
|
|
|
<p>
|
|
|
|
You can use GF for <b>parsing</b>:
|
|
<pre>
|
|
> parse "the boy eats a snake"
|
|
Mks_0 (Mks_6 Mks_9) (Mks_2 Mks_20 (Mks_7 Mks_11))
|
|
|
|
> parse "the snake eats a boy"
|
|
Mks_0 (Mks_6 Mks_11) (Mks_2 Mks_20 (Mks_7 Mks_9))
|
|
</pre>
|
|
The <tt>parse</tt> (= <tt>p</tt>) command takes a <b>string</b>
|
|
(in double quotes) and returns an <b>abstract syntax tree</b> - the thing
|
|
with <tt>Mks</tt>s and parentheses. We will see soon how to make sense
|
|
of the abstract syntax trees - now you should just notice that the tree
|
|
is different for the two strings.
|
|
|
|
<p>
|
|
|
|
Strings that return a tree when parsed do so in virtue of the grammar
|
|
you imported. Try parsing something else, and you fail
|
|
<pre>
|
|
> p "hello world"
|
|
No success in cf parsing
|
|
no tree found
|
|
<pre>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h3>Generating trees and strings</h3>
|
|
|
|
You can also use GF for <b>linearizing</b>
|
|
(<tt>linearize = l</tt>). This is the inverse of
|
|
parsing, taking trees into strings:
|
|
<pre>
|
|
> linearize Mks_0 (Mks_6 Mks_11) (Mks_2 Mks_20 (Mks_7 Mks_9))
|
|
the snake eats a boy
|
|
</pre>
|
|
What is the use of this? Typically not that you type in a tree at
|
|
the GF prompt. The utility of linearization comes from the fact that
|
|
you can obtain a tree from somewhere else. One way to do so is
|
|
<b>random generation</b> (<tt>generate_random = gr</tt>):
|
|
<pre>
|
|
> generate_random
|
|
Mks_0 (Mks_4 Mks_11) (Mks_3 Mks_15)
|
|
</pre>
|
|
Now you can copy the tree and paste it to the <tt>linearize command</tt>.
|
|
Or, more efficiently, feed random generation into parsing by using
|
|
a <b>pipe</b>.
|
|
<pre>
|
|
> gr | l
|
|
this worm is warm
|
|
</pre>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h3>Some random-generated sentences</h3>
|
|
|
|
Random generation can be quite amusing. So you may want to
|
|
generate ten strings with one and the same command:
|
|
<pre>
|
|
> gr -number=10 | l
|
|
this boy is green
|
|
a snake laughs
|
|
the rotten boy is thick
|
|
a boy washes this worm
|
|
a boy is warm
|
|
this green warm boy is rotten
|
|
the green thick green louse is rotten
|
|
that boy is green
|
|
this thick thick boy laughs
|
|
a boy is green
|
|
</pre>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h3>Systematic generation</h3>
|
|
|
|
To generate <i>all</i> sentence that a grammar
|
|
can generate, use the command <tt>generate_trees = gt</tt>.
|
|
<pre>
|
|
this louse laughs
|
|
this louse sleeps
|
|
this louse swims
|
|
this louse is green
|
|
this louse is rotten
|
|
...
|
|
a boy is rotten
|
|
a boy is thick
|
|
a boy is warm
|
|
</pre>
|
|
You get quite a few trees but not all of them: only up to a given
|
|
<b>depth</b> of trees. To see how you can get more, use the
|
|
<tt>help = h</tt> command,
|
|
<pre>
|
|
h gr
|
|
</pre>
|
|
<b>Quiz</b>. If the command <tt>gt</tt> generated all
|
|
trees in your grammar, it would never terminate. Why?
|
|
|
|
|
|
<!-- NEW -->
|
|
<h3>More on pipes; tracing</h3>
|
|
|
|
A pipe of GF commands can have any length, but the "output type"
|
|
(either string or tree) of one command must always match the "input type"
|
|
of the next command.
|
|
|
|
<p>
|
|
|
|
The intermediate results in a pipe can be observed by putting the
|
|
<b>tracing</b> flag <tt>-tr</tt> to each command whose output you
|
|
want to see:
|
|
<pre>
|
|
> gr -tr | l -tr | p
|
|
Mks_0 (Mks_7 Mks_10) (Mks_1 Mks_18)
|
|
a louse sleeps
|
|
Mks_0 (Mks_7 Mks_10) (Mks_1 Mks_18)
|
|
</pre>
|
|
This facility is good for test purposes: for instance, you
|
|
may want to see if a grammar is <b>ambiguous</b>, i.e.
|
|
contains strings that can be parsed in more than one way.
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h3>Writing and reading files</h3>
|
|
|
|
To save the outputs of GF commands into a file, you can
|
|
pipe it to the <tt>write_file = wf</tt> command,
|
|
<pre>
|
|
> gr -number=10 | l | write_file exx.tmp
|
|
</pre>
|
|
You can read the file back to GF with the
|
|
<tt>read_file = rf</tt> command,
|
|
<pre>
|
|
> read_file exx.tmp | l -tr | p -lines
|
|
</pre>
|
|
Notice the flag <tt>-lines</tt> given to the parsing
|
|
command. This flag tells GF to parse each line of
|
|
the file separately. Without the flag, the grammar could
|
|
not recognize the string in the file, because it is not
|
|
a sentence but a sequence of ten sentences.
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h3>Labelled context-free grammars</h3>
|
|
|
|
The syntax trees returned by GF's parser in the previous examples
|
|
are not so nice to look at. The identifiers of form <tt>Mks</tt>
|
|
are <b>labels</b> of the EBNF rules. To see which label corresponds to
|
|
which rule, you can use the <tt>print_grammar = pg</tt> command
|
|
with the <tt>printer</tt> flag set to <tt>cf</tt> (which means context-free):
|
|
<pre>
|
|
> print_grammar -printer=cf
|
|
Mks_10. CN ::= "louse" ;
|
|
Mks_11. CN ::= "snake" ;
|
|
Mks_12. CN ::= "worm" ;
|
|
Mks_8. CN ::= A CN ;
|
|
Mks_9. CN ::= "boy" ;
|
|
Mks_4. NP ::= "this" CN ;
|
|
Mks_15. A ::= "thick" ;
|
|
...
|
|
</pre>
|
|
A syntax tree such as
|
|
<pre>
|
|
Mks_4 (Mks_8 Mks_15 Mks_12)
|
|
this thick worm
|
|
</pre>
|
|
encodes the sequence of grammar rules used for building the
|
|
expression. If you look at this tree, you will notice that <tt>Mks_4</tt>
|
|
is the label of the rule prefixing <tt>this</tt> to a common noun,
|
|
<tt>Mks_15</tt> is the label of the adjective <tt>thick</tt>,
|
|
and so on.
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>The labelled context-free format</h4>
|
|
|
|
The <b>labelled context-free grammar</b> format permits user-defined
|
|
labels to each rule. GF recognizes files of this format by the suffix
|
|
<tt>.cf</tt>. Let us include the following rules in the file
|
|
<tt>paleolithic.cf</tt>.
|
|
<pre>
|
|
PredVP. S ::= NP VP ;
|
|
UseV. VP ::= V ;
|
|
ComplTV. VP ::= TV NP ;
|
|
UseA. VP ::= "is" A ;
|
|
This. NP ::= "this" CN ;
|
|
That. NP ::= "that" CN ;
|
|
Def. NP ::= "the" CN ;
|
|
Indef. NP ::= "a" CN ;
|
|
ModA. CN ::= A CN ;
|
|
Boy. CN ::= "boy" ;
|
|
Louse. CN ::= "louse" ;
|
|
Snake. CN ::= "snake" ;
|
|
Worm. CN ::= "worm" ;
|
|
Green. A ::= "green" ;
|
|
Rotten. A ::= "rotten" ;
|
|
Thick. A ::= "thick" ;
|
|
Warm. A ::= "warm" ;
|
|
Laugh. V ::= "laughs" ;
|
|
Sleep. V ::= "sleeps" ;
|
|
Swim. V ::= "swims" ;
|
|
Eat. TV ::= "eats" ;
|
|
Kill. TV ::= "kills"
|
|
Wash. TV ::= "washes" ;
|
|
</pre>
|
|
|
|
<!-- NEW -->
|
|
<h4>Using the labelled context-free format</h4>
|
|
|
|
The GF commands for the <tt>.cf</tt> format are
|
|
exactly the same as for the <tt>.ebnf</tt> format.
|
|
Just the syntax trees become nicer to read and
|
|
to remember. Notice that before reading in
|
|
a new grammar in GF you often (but not always,
|
|
as we will see later) have first to give the
|
|
command (<tt>empty = e</tt>), which removes the
|
|
old grammar from the GF shell state.
|
|
<pre>
|
|
> empty
|
|
|
|
> i paleolithic.cf
|
|
|
|
> p "the boy eats a snake"
|
|
PredVP (Def Boy) (ComplTV Eat (Indef Snake))
|
|
|
|
> gr -tr | l
|
|
PredVP (Indef Louse) (UseA Thick)
|
|
a louse is thick
|
|
</pre>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>The GF grammar format</h2>
|
|
|
|
To see what there really is in GF's shell state when a grammar
|
|
has been imported, you can give the plain command
|
|
<tt>print_grammar = pg</tt>.
|
|
<pre>
|
|
> print_grammar
|
|
</pre>
|
|
The output is quite unreadable at this stage, and you may feel happy that
|
|
you did not need to write the grammar in that notation, but that the
|
|
GF grammar compiler produced it.
|
|
|
|
<p>
|
|
|
|
However, we will now start to show how GF's own notation gives you
|
|
much more expressive power than the <tt>.cf</tt> and <tt>.ebnf</tt>
|
|
formats. We will introduce the <tt>.gf</tt> format by presenting
|
|
one more way of defining the same grammar as in
|
|
<tt>paleolithic.cf</tt> and <tt>paleolithic.ebnf</tt>.
|
|
Then we will show how the full GF grammar format enables you
|
|
to do things that are not possible in the weaker formats.
|
|
|
|
|
|
<!-- NEW -->
|
|
<h3>Abstract and concrete syntax</h3>
|
|
|
|
A GF grammar consists of two main parts:
|
|
<ul>
|
|
<li> <b>abstract syntax</b>, defining what syntax trees there are
|
|
<li> <b>concrete syntax</b>, defining how trees are linearized into strings
|
|
</ul>
|
|
The EBNF and CF formats fuse these two things together, but it is possible
|
|
to take them apart. For instance, the verb phrase predication rule
|
|
<pre>
|
|
PredVP. S ::= NP VP ;
|
|
</pre>
|
|
is interpreted as the following pair of rules:
|
|
<pre>
|
|
fun PredVP : NP -> VP -> S ;
|
|
lin PredVP x y = {s = x.s ++ y.s} ;
|
|
</pre>
|
|
The former rule, with the keyword <tt>fun</tt>, belongs to the abstract syntax.
|
|
It defines the <b>function</b>
|
|
<tt>PredVP</tt> which constructs syntax trees of form
|
|
(<tt>PredVP</tt> <i>x</i> <i>y</i>).
|
|
|
|
<p>
|
|
|
|
The latter rule, with the keyword <tt>lin</tt>, belongs to the concrete syntax.
|
|
It defines the <b>linearization function</b> for
|
|
syntax trees of form (<tt>PredVP</tt> <i>x</i> <i>y</i>).
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>Judgement forms</h4>
|
|
|
|
Rules in a GF grammar are called <b>judgements</b>, and the keywords
|
|
<tt>fun</tt> and <tt>lin</tt> are used for distinguishing between two
|
|
<b>judgement forms</b>. Here is a summary of the most important
|
|
judgement forms:
|
|
<ul>
|
|
<li> abstract syntax
|
|
<ul>
|
|
<li> cat C
|
|
<li> fun f : A
|
|
</ul>
|
|
<li> concrete syntax
|
|
<ul>
|
|
<li> lincat C = T
|
|
<li> lin f x ... y = t
|
|
</ul>
|
|
</ul>
|
|
We return to the precise meanings of these judgement forms later.
|
|
First we will look at how judgements are grouped into modules, and
|
|
show how the grammar <tt>paleolithic.cf</tt> is
|
|
expressed by using modules and judgements.
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>Module types</h4>
|
|
|
|
A GF grammar consists of <b>modules</b>,
|
|
into which judgements are grouped. The most important
|
|
module forms are
|
|
<ul>
|
|
<li> <tt>abstract</tt> A = M</tt>, abstract syntax A with judgements in
|
|
the module body M.
|
|
<li> <tt>concrete</tt> C <tt>of</tt> A = M</tt>, concrete syntax C of the
|
|
abstract syntax A, with judgements in the module body M.
|
|
</ul>
|
|
|
|
<!-- NEW -->
|
|
<h4>An abstract syntax example</h4>
|
|
|
|
Each nonterminal occurring in <tt>paleolithic.cf</tt> is
|
|
introduced by a <tt>cat</tt> judgement. Each
|
|
rule label is introduced by a <tt>fun</tt> judgement.
|
|
<pre>
|
|
abstract Paleolithic = {
|
|
cat
|
|
S ; NP ; VP ; CN ; A ; V ; TV ;
|
|
fun
|
|
PredVP : NP -> VP -> S ;
|
|
UseV : V -> VP ;
|
|
ComplTV : TV -> NP -> VP ;
|
|
UseA : A -> VP ;
|
|
ModA : A -> CN -> CN ;
|
|
This, That, Def, Indef : CN -> NP ;
|
|
Boy, Louse, Snake, Worm : CN ;
|
|
Green, Rotten, Thick, Warm : A ;
|
|
Laugh, Sleep, Swim : V ;
|
|
Eat, Kill, Wash : TV ;
|
|
}
|
|
</pre>
|
|
Notice the use of shorthands permitting the sharing of
|
|
the keyword in subsequent judgements, and of the type
|
|
in subsequent <tt>fun</tt> judgements.
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>A concrete syntax example</h4>
|
|
|
|
Each category introduced in <tt>Paleolithic.gf</tt> is
|
|
given a <tt>lincat</tt> rule, and each
|
|
function is given a <tt>fun</tt> rule. Similar shorthands
|
|
apply as in <tt>abstract</tt> modules.
|
|
<pre>
|
|
concrete PaleolithicEng of Paleolithic = {
|
|
lincat
|
|
S, NP, VP, CN, A, V, TV = {s : Str} ;
|
|
lin
|
|
PredVP np vp = {s = np.s ++ vp.s} ;
|
|
UseV v = v ;
|
|
ComplTV tv np = {s = tv.s ++ np.s} ;
|
|
UseA a = {s = "is" ++ a.s} ;
|
|
This cn = {s = "this" ++ cn.s} ;
|
|
That cn = {s = "that" ++ cn.s} ;
|
|
Def cn = {s = "the" ++ cn.s} ;
|
|
Indef cn = {s = "a" ++ cn.s} ;
|
|
ModA a cn = {s = a.s ++ cn.s} ;
|
|
Boy = {s = "boy"} ;
|
|
Louse = {s = "louse"} ;
|
|
Snake = {s = "snake"} ;
|
|
Worm = {s = "worm"} ;
|
|
Green = {s = "green"} ;
|
|
Rotten = {s = "rotten"} ;
|
|
Thick = {s = "thick"} ;
|
|
Warm = {s = "warm"} ;
|
|
Laugh = {s = "laughs"} ;
|
|
Sleep = {s = "sleeps"} ;
|
|
Swim = {s = "swims"} ;
|
|
Eat = {s = "eats"} ;
|
|
Kill = {s = "kills"} ;
|
|
Wash = {s = "washes"} ;
|
|
}
|
|
</pre>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>Modules and files</h4>
|
|
|
|
Module name + <tt>.gf</tt> = file name
|
|
|
|
<p>
|
|
|
|
Each module is compiled into a <tt>.gfc</tt> file.
|
|
|
|
<p>
|
|
|
|
Import <tt>PaleolithicEng.gf</tt> and try what happens
|
|
<pre>
|
|
|
|
</pre>
|
|
Nothing more than before, except that the GFC files
|
|
are generated.
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>An Italian concrete syntax</h4>
|
|
|
|
<pre>
|
|
concrete PaleolithicIta of Paleolithic = {
|
|
lincat
|
|
S, NP, VP, CN, A, V, TV = {s : Str} ;
|
|
lin
|
|
PredVP np vp = {s = np.s ++ vp.s} ;
|
|
UseV v = v ;
|
|
ComplTV tv np = {s = tv.s ++ np.s} ;
|
|
UseA a = {s = "è" ++ a.s} ;
|
|
This cn = {s = "questo" ++ cn.s} ;
|
|
That cn = {s = "quello" ++ cn.s} ;
|
|
Def cn = {s = "il" ++ cn.s} ;
|
|
Indef cn = {s = "un" ++ cn.s} ;
|
|
ModA a cn = {s = cn.s ++ a.s} ;
|
|
Boy = {s = "ragazzo"} ;
|
|
Louse = {s = "pidocchio"} ;
|
|
Snake = {s = "serpente"} ;
|
|
Worm = {s = "verme"} ;
|
|
Green = {s = "verde"} ;
|
|
Rotten = {s = "marcio"} ;
|
|
Thick = {s = "grosso"} ;
|
|
Warm = {s = "caldo"} ;
|
|
Laugh = {s = "ride"} ;
|
|
Sleep = {s = "dorme"} ;
|
|
Swim = {s = "nuota"} ;
|
|
Eat = {s = "mangia"} ;
|
|
Kill = {s = "uccide"} ;
|
|
Wash = {s = "lava"} ;
|
|
}
|
|
</pre>
|
|
|
|
<!-- NEW -->
|
|
<h4>Using a multilingual grammar</h4>
|
|
|
|
Import without first emptying
|
|
<pre>
|
|
|
|
</pre>
|
|
Try generation now:
|
|
<pre>
|
|
|
|
</pre>
|
|
Translate by using a pipe:
|
|
<pre>
|
|
|
|
</pre>
|
|
Inspect the shell state (<tt>print_options = po</tt>):
|
|
<pre>
|
|
> print_options
|
|
main abstract : Paleolithic
|
|
main concrete : PaleolithicIta
|
|
all concretes : PaleolithicIta PaleolithicEng
|
|
</pre>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h4>Extending the grammar</h4>
|
|
|
|
Neolithic: fire, wheel, think,...
|
|
|
|
|
|
|
|
</body>
|
|
</html> |