forked from GitHub/gf-core
1422 lines
42 KiB
HTML
1422 lines
42 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||
<HTML>
|
||
<HEAD>
|
||
<META NAME="generator" CONTENT="http://txt2tags.sf.net">
|
||
<TITLE>Grammatical Framework Tutorial</TITLE>
|
||
</HEAD><BODY BGCOLOR="white" TEXT="black">
|
||
<P ALIGN="center"><CENTER><H1>Grammatical Framework Tutorial</H1>
|
||
<FONT SIZE="4">
|
||
<I>Author: Aarne Ranta <aarne (at) cs.chalmers.se></I><BR>
|
||
Last update: Fri Dec 16 15:10:23 2005
|
||
</FONT></CENTER>
|
||
|
||
<P></P>
|
||
<HR NOSHADE SIZE=1>
|
||
<P></P>
|
||
<UL>
|
||
<LI><A HREF="#toc1">GF = Grammatical Framework</A>
|
||
<UL>
|
||
<LI><A HREF="#toc2">Getting the GF program</A>
|
||
</UL>
|
||
<LI><A HREF="#toc3">My first grammar</A>
|
||
<UL>
|
||
<LI><A HREF="#toc4">Importing grammars and parsing strings</A>
|
||
<LI><A HREF="#toc5">Generating trees and strings</A>
|
||
<LI><A HREF="#toc6">Some random-generated sentences</A>
|
||
<LI><A HREF="#toc7">Systematic generation</A>
|
||
<LI><A HREF="#toc8">More on pipes; tracing</A>
|
||
<LI><A HREF="#toc9">Writing and reading files</A>
|
||
<LI><A HREF="#toc10">Labelled context-free grammars</A>
|
||
</UL>
|
||
<LI><A HREF="#toc11">The GF grammar format</A>
|
||
<UL>
|
||
<LI><A HREF="#toc12">Abstract and concrete syntax</A>
|
||
<LI><A HREF="#toc13">Resource modules</A>
|
||
<LI><A HREF="#toc14">Opening a ``resource``</A>
|
||
</UL>
|
||
<LI><A HREF="#toc15">Topics still to be written</A>
|
||
</UL>
|
||
|
||
<P></P>
|
||
<HR NOSHADE SIZE=1>
|
||
<P></P>
|
||
<P>
|
||
<IMG ALIGN="middle" SRC="../gf-logo.gif" BORDER="0" ALT="">
|
||
</P>
|
||
<A NAME="toc1"></A>
|
||
<H2>GF = Grammatical Framework</H2>
|
||
<P>
|
||
The term GF is used for different things:
|
||
</P>
|
||
<UL>
|
||
<LI>a <B>program</B> used for working with grammars
|
||
<LI>a <B>programming language</B> in which grammars can be written
|
||
<LI>a <B>theory</B> about grammars and languages
|
||
</UL>
|
||
|
||
<P>
|
||
This tutorial is primarily about the GF program and
|
||
the GF programming language.
|
||
It will guide you
|
||
</P>
|
||
<UL>
|
||
<LI>to use the GF program
|
||
<LI>to write GF grammars
|
||
<LI>to write programs in which GF grammars are used as components
|
||
</UL>
|
||
|
||
<A NAME="toc2"></A>
|
||
<H3>Getting the GF program</H3>
|
||
<P>
|
||
The program is open-source free software, which you can download via the
|
||
GF Homepage:
|
||
<A HREF="http://www.cs.chalmers.se/~aarne/GF"><CODE>http://www.cs.chalmers.se/~aarne/GF</CODE></A>
|
||
</P>
|
||
<P>
|
||
There you can download
|
||
</P>
|
||
<UL>
|
||
<LI>ready-made binaries for Linux, Solaris, Macintosh, and Windows
|
||
<LI>source code and documentation
|
||
<LI>grammar libraries and examples
|
||
</UL>
|
||
|
||
<P>
|
||
If you want to compile GF from source, you need Haskell and Java
|
||
compilers. But normally you don't have to compile, and you definitely
|
||
don't need to know Haskell or Java to use GF.
|
||
</P>
|
||
<P>
|
||
To start the GF program, assuming you have installed it, just type
|
||
</P>
|
||
<PRE>
|
||
gf
|
||
</PRE>
|
||
<P>
|
||
in the shell. You will see GF's welcome message and the prompt <CODE>></CODE>.
|
||
</P>
|
||
<A NAME="toc3"></A>
|
||
<H2>My first grammar</H2>
|
||
<P>
|
||
Now you are ready to try out your first grammar.
|
||
We start with one that is not written in GF language, but
|
||
in the ubiquitous BNF notation (Backus Naur Form), which GF can also
|
||
understand. Type (or copy) the following lines in a file named
|
||
<CODE>paleolithic.ebnf</CODE>:
|
||
</P>
|
||
<PRE>
|
||
S ::= NP VP ;
|
||
VP ::= V | TV NP | "is" A ;
|
||
NP ::= "this" CN | "that" CN | "the" CN | "a" CN ;
|
||
CN ::= A CN ;
|
||
CN ::= "boy" | "louse" | "snake" | "worm" ;
|
||
A ::= "green" | "rotten" | "thick" | "warm" ;
|
||
V ::= "laughs" | "sleeps" | "swims" ;
|
||
TV ::= "eats" | "kills" | "washes" ;
|
||
</PRE>
|
||
<P></P>
|
||
<P>
|
||
(The name <CODE>paleolithic</CODE> refers to a larger package
|
||
<A HREF="http://www.cs.chalmers.se/~aarne/GF/examples/stoneage/">stoneage</A>,
|
||
which implements a fragment of primitive language. This fragment
|
||
was defined by the linguist Morris Swadesh as a tool for studying
|
||
the historical relations of languages. But as pointed out
|
||
in the Wiktionary article on
|
||
<A HREF="http://en.wiktionary.org/wiki/Wiktionary:Swadesh_list">Swadesh list</A>, the
|
||
fragment is also usable for basic communication with foreigners.)
|
||
</P>
|
||
<A NAME="toc4"></A>
|
||
<H3>Importing grammars and parsing strings</H3>
|
||
<P>
|
||
The first GF command when using a grammar is to <B>import</B> it.
|
||
The command has a long name, <CODE>import</CODE>, and a short name, <CODE>i</CODE>.
|
||
</P>
|
||
<PRE>
|
||
import paleolithic.gf
|
||
</PRE>
|
||
<P>
|
||
The GF program now <B>compiles</B> your grammar into an internal
|
||
representation, and shows a new prompt when it is ready.
|
||
</P>
|
||
<P>
|
||
You can use GF for <B>parsing</B>:
|
||
</P>
|
||
<PRE>
|
||
> parse "the boy eats a snake"
|
||
Mks_0 (Mks_6 Mks_9) (Mks_2 Mks_20 (Mks_7 Mks_11))
|
||
|
||
> parse "the snake eats a boy"
|
||
Mks_0 (Mks_6 Mks_11) (Mks_2 Mks_20 (Mks_7 Mks_9))
|
||
</PRE>
|
||
<P>
|
||
The <CODE>parse</CODE> (= <CODE>p</CODE>) command takes a <B>string</B>
|
||
(in double quotes) and returns an <B>abstract syntax tree</B> - the thing
|
||
with <CODE>Mks</CODE>s and parentheses. We will see soon how to make sense
|
||
of the abstract syntax trees - now you should just notice that the tree
|
||
is different for the two strings.
|
||
</P>
|
||
<P>
|
||
Strings that return a tree when parsed do so in virtue of the grammar
|
||
you imported. Try parsing something else, and you fail
|
||
</P>
|
||
<PRE>
|
||
> p "hello world"
|
||
No success in cf parsing
|
||
no tree found
|
||
</PRE>
|
||
<P></P>
|
||
<A NAME="toc5"></A>
|
||
<H3>Generating trees and strings</H3>
|
||
<P>
|
||
You can also use GF for <B>linearizing</B>
|
||
(<CODE>linearize = l</CODE>). This is the inverse of
|
||
parsing, taking trees into strings:
|
||
</P>
|
||
<PRE>
|
||
> linearize Mks_0 (Mks_6 Mks_11) (Mks_2 Mks_20 (Mks_7 Mks_9))
|
||
the snake eats a boy
|
||
</PRE>
|
||
<P>
|
||
What is the use of this? Typically not that you type in a tree at
|
||
the GF prompt. The utility of linearization comes from the fact that
|
||
you can obtain a tree from somewhere else. One way to do so is
|
||
<B>random generation</B> (<CODE>generate_random = gr</CODE>):
|
||
</P>
|
||
<PRE>
|
||
> generate_random
|
||
Mks_0 (Mks_4 Mks_11) (Mks_3 Mks_15)
|
||
</PRE>
|
||
<P>
|
||
Now you can copy the tree and paste it to the <CODE>linearize command</CODE>.
|
||
Or, more efficiently, feed random generation into parsing by using
|
||
a <B>pipe</B>.
|
||
</P>
|
||
<PRE>
|
||
> gr | l
|
||
this worm is warm
|
||
</PRE>
|
||
<P></P>
|
||
<A NAME="toc6"></A>
|
||
<H3>Some random-generated sentences</H3>
|
||
<P>
|
||
Random generation can be quite amusing. So you may want to
|
||
generate ten strings with one and the same command:
|
||
</P>
|
||
<PRE>
|
||
> gr -number=10 | l
|
||
this boy is green
|
||
a snake laughs
|
||
the rotten boy is thick
|
||
a boy washes this worm
|
||
a boy is warm
|
||
this green warm boy is rotten
|
||
the green thick green louse is rotten
|
||
that boy is green
|
||
this thick thick boy laughs
|
||
a boy is green
|
||
</PRE>
|
||
<P></P>
|
||
<A NAME="toc7"></A>
|
||
<H3>Systematic generation</H3>
|
||
<P>
|
||
To generate <i>all<i> sentence that a grammar
|
||
can generate, use the command <CODE>generate_trees = gt</CODE>.
|
||
</P>
|
||
<PRE>
|
||
> generate_trees | l
|
||
this louse laughs
|
||
this louse sleeps
|
||
this louse swims
|
||
this louse is green
|
||
this louse is rotten
|
||
...
|
||
a boy is rotten
|
||
a boy is thick
|
||
a boy is warm
|
||
</PRE>
|
||
<P>
|
||
You get quite a few trees but not all of them: only up to a given
|
||
<B>depth</B> of trees. To see how you can get more, use the
|
||
<CODE>help = h</CODE> command,
|
||
</P>
|
||
<PRE>
|
||
help gr
|
||
</PRE>
|
||
<P>
|
||
<B>Quiz</B>. If the command <CODE>gt</CODE> generated all
|
||
trees in your grammar, it would never terminate. Why?
|
||
</P>
|
||
<A NAME="toc8"></A>
|
||
<H3>More on pipes; tracing</H3>
|
||
<P>
|
||
A pipe of GF commands can have any length, but the "output type"
|
||
(either string or tree) of one command must always match the "input type"
|
||
of the next command.
|
||
</P>
|
||
<P>
|
||
The intermediate results in a pipe can be observed by putting the
|
||
<B>tracing</B> flag <CODE>-tr</CODE> to each command whose output you
|
||
want to see:
|
||
</P>
|
||
<PRE>
|
||
> gr -tr | l -tr | p
|
||
Mks_0 (Mks_7 Mks_10) (Mks_1 Mks_18)
|
||
a louse sleeps
|
||
Mks_0 (Mks_7 Mks_10) (Mks_1 Mks_18)
|
||
</PRE>
|
||
<P>
|
||
This facility is good for test purposes: for instance, you
|
||
may want to see if a grammar is <B>ambiguous</B>, i.e.
|
||
contains strings that can be parsed in more than one way.
|
||
</P>
|
||
<A NAME="toc9"></A>
|
||
<H3>Writing and reading files</H3>
|
||
<P>
|
||
To save the outputs of GF commands into a file, you can
|
||
pipe it to the <CODE>write_file = wf</CODE> command,
|
||
</P>
|
||
<PRE>
|
||
> gr -number=10 | l | write_file exx.tmp
|
||
</PRE>
|
||
<P>
|
||
You can read the file back to GF with the
|
||
<CODE>read_file = rf</CODE> command,
|
||
</P>
|
||
<PRE>
|
||
> read_file exx.tmp | l -tr | p -lines
|
||
</PRE>
|
||
<P>
|
||
Notice the flag <CODE>-lines</CODE> given to the parsing
|
||
command. This flag tells GF to parse each line of
|
||
the file separately. Without the flag, the grammar could
|
||
not recognize the string in the file, because it is not
|
||
a sentence but a sequence of ten sentences.
|
||
</P>
|
||
<A NAME="toc10"></A>
|
||
<H3>Labelled context-free grammars</H3>
|
||
<P>
|
||
The syntax trees returned by GF's parser in the previous examples
|
||
are not so nice to look at. The identifiers of form <CODE>Mks</CODE>
|
||
are <B>labels</B> of the EBNF rules. To see which label corresponds to
|
||
which rule, you can use the <CODE>print_grammar = pg</CODE> command
|
||
with the <CODE>printer</CODE> flag set to <CODE>cf</CODE> (which means context-free):
|
||
</P>
|
||
<PRE>
|
||
> print_grammar -printer=cf
|
||
Mks_10. CN ::= "louse" ;
|
||
Mks_11. CN ::= "snake" ;
|
||
Mks_12. CN ::= "worm" ;
|
||
Mks_8. CN ::= A CN ;
|
||
Mks_9. CN ::= "boy" ;
|
||
Mks_4. NP ::= "this" CN ;
|
||
Mks_15. A ::= "thick" ;
|
||
...
|
||
</PRE>
|
||
<P>
|
||
A syntax tree such as
|
||
</P>
|
||
<PRE>
|
||
Mks_4 (Mks_8 Mks_15 Mks_12)
|
||
this thick worm
|
||
</PRE>
|
||
<P>
|
||
encodes the sequence of grammar rules used for building the
|
||
expression. If you look at this tree, you will notice that <CODE>Mks_4</CODE>
|
||
is the label of the rule prefixing <CODE>this</CODE> to a common noun,
|
||
<CODE>Mks_15</CODE> is the label of the adjective <CODE>thick</CODE>,
|
||
and so on.
|
||
</P>
|
||
<P>
|
||
<h4>The labelled context-free format<h4>
|
||
</P>
|
||
<P>
|
||
The <B>labelled context-free grammar</B> format permits user-defined
|
||
labels to each rule.
|
||
GF recognizes files of this format by the suffix
|
||
<CODE>.cf</CODE>. It is intermediate between EBNF and full GF format.
|
||
Let us include the following rules in the file
|
||
<CODE>paleolithic.cf</CODE>.
|
||
</P>
|
||
<PRE>
|
||
PredVP. S ::= NP VP ;
|
||
UseV. VP ::= V ;
|
||
ComplTV. VP ::= TV NP ;
|
||
UseA. VP ::= "is" A ;
|
||
This. NP ::= "this" CN ;
|
||
That. NP ::= "that" CN ;
|
||
Def. NP ::= "the" CN ;
|
||
Indef. NP ::= "a" CN ;
|
||
ModA. CN ::= A CN ;
|
||
Boy. CN ::= "boy" ;
|
||
Louse. CN ::= "louse" ;
|
||
Snake. CN ::= "snake" ;
|
||
Worm. CN ::= "worm" ;
|
||
Green. A ::= "green" ;
|
||
Rotten. A ::= "rotten" ;
|
||
Thick. A ::= "thick" ;
|
||
Warm. A ::= "warm" ;
|
||
Laugh. V ::= "laughs" ;
|
||
Sleep. V ::= "sleeps" ;
|
||
Swim. V ::= "swims" ;
|
||
Eat. TV ::= "eats" ;
|
||
Kill. TV ::= "kills"
|
||
Wash. TV ::= "washes" ;
|
||
</PRE>
|
||
<P></P>
|
||
<P>
|
||
<h4>Using the labelled context-free format<h4>
|
||
</P>
|
||
<P>
|
||
The GF commands for the <CODE>.cf</CODE> format are
|
||
exactly the same as for the <CODE>.ebnf</CODE> format.
|
||
Just the syntax trees become nicer to read and
|
||
to remember. Notice that before reading in
|
||
a new grammar in GF you often (but not always,
|
||
as we will see later) have first to give the
|
||
command (<CODE>empty = e</CODE>), which removes the
|
||
old grammar from the GF shell state.
|
||
</P>
|
||
<PRE>
|
||
> empty
|
||
|
||
> i paleolithic.cf
|
||
|
||
> p "the boy eats a snake"
|
||
PredVP (Def Boy) (ComplTV Eat (Indef Snake))
|
||
|
||
> gr -tr | l
|
||
PredVP (Indef Louse) (UseA Thick)
|
||
a louse is thick
|
||
</PRE>
|
||
<P></P>
|
||
<A NAME="toc11"></A>
|
||
<H2>The GF grammar format</H2>
|
||
<P>
|
||
To see what there really is in GF's shell state when a grammar
|
||
has been imported, you can give the plain command
|
||
<CODE>print_grammar = pg</CODE>.
|
||
</P>
|
||
<PRE>
|
||
> print_grammar
|
||
</PRE>
|
||
<P>
|
||
The output is quite unreadable at this stage, and you may feel happy that
|
||
you did not need to write the grammar in that notation, but that the
|
||
GF grammar compiler produced it.
|
||
</P>
|
||
<P>
|
||
However, we will now start to show how GF's own notation gives you
|
||
much more expressive power than the <CODE>.cf</CODE> and <CODE>.ebnf</CODE>
|
||
formats. We will introduce the <CODE>.gf</CODE> format by presenting
|
||
one more way of defining the same grammar as in
|
||
<CODE>paleolithic.cf</CODE> and <CODE>paleolithic.ebnf</CODE>.
|
||
Then we will show how the full GF grammar format enables you
|
||
to do things that are not possible in the weaker formats.
|
||
</P>
|
||
<A NAME="toc12"></A>
|
||
<H3>Abstract and concrete syntax</H3>
|
||
<P>
|
||
A GF grammar consists of two main parts:
|
||
</P>
|
||
<UL>
|
||
<LI><B>abstract syntax</B>, defining what syntax trees there are
|
||
<LI><B>concrete syntax</B>, defining how trees are linearized into strings
|
||
</UL>
|
||
|
||
<P>
|
||
The EBNF and CF formats fuse these two things together, but it is possible
|
||
to take them apart. For instance, the verb phrase predication rule
|
||
</P>
|
||
<PRE>
|
||
PredVP. S ::= NP VP ;
|
||
</PRE>
|
||
<P>
|
||
is interpreted as the following pair of rules:
|
||
</P>
|
||
<PRE>
|
||
fun PredVP : NP -> VP -> S ;
|
||
lin PredVP x y = {s = x.s ++ y.s} ;
|
||
</PRE>
|
||
<P>
|
||
The former rule, with the keyword <CODE>fun</CODE>, belongs to the abstract syntax.
|
||
It defines the <B>function</B>
|
||
<CODE>PredVP</CODE> which constructs syntax trees of form
|
||
(<CODE>PredVP</CODE> <i>x<i> <i>y<i>).
|
||
</P>
|
||
<P>
|
||
The latter rule, with the keyword <CODE>lin</CODE>, belongs to the concrete syntax.
|
||
It defines the <B>linearization function</B> for
|
||
syntax trees of form (<CODE>PredVP</CODE> <i>x<i> <i>y<i>).
|
||
</P>
|
||
<P>
|
||
<h4>Judgement forms<h4>
|
||
</P>
|
||
<P>
|
||
Rules in a GF grammar are called <B>judgements</B>, and the keywords
|
||
<CODE>fun</CODE> and <CODE>lin</CODE> are used for distinguishing between two
|
||
<B>judgement forms</B>. Here is a summary of the most important
|
||
judgement forms:
|
||
</P>
|
||
<UL>
|
||
<LI>abstract syntax
|
||
<P></P>
|
||
</UL>
|
||
|
||
<TABLE ALIGN="center" CELLPADDING="4" BORDER="1">
|
||
<TR>
|
||
<TD>form</TD>
|
||
<TD>reading</TD>
|
||
</TR>
|
||
<TR>
|
||
<TD><CODE>cat</CODE> C</TD>
|
||
<TD>C is a category</TD>
|
||
</TR>
|
||
<TR>
|
||
<TD><CODE>fun</CODE> f <CODE>:</CODE> A</TD>
|
||
<TD>f is a function of type A</TD>
|
||
</TR>
|
||
</TABLE>
|
||
|
||
<P></P>
|
||
<UL>
|
||
<LI>concrete syntax
|
||
<P></P>
|
||
</UL>
|
||
|
||
<TABLE ALIGN="center" CELLPADDING="4" BORDER="1">
|
||
<TR>
|
||
<TD>form</TD>
|
||
<TD>reading</TD>
|
||
</TR>
|
||
<TR>
|
||
<TD><CODE>lincat</CODE> C <CODE>=</CODE> T</TD>
|
||
<TD>category C has linearization type T</TD>
|
||
</TR>
|
||
<TR>
|
||
<TD><CODE>lin</CODE> f <CODE>=</CODE> t</TD>
|
||
<TD>function f has linearization t</TD>
|
||
</TR>
|
||
</TABLE>
|
||
|
||
<P></P>
|
||
<P>
|
||
We return to the precise meanings of these judgement forms later.
|
||
First we will look at how judgements are grouped into modules, and
|
||
show how the grammar <CODE>paleolithic.cf</CODE> is
|
||
expressed by using modules and judgements.
|
||
</P>
|
||
<P>
|
||
<h4>Module types<h4>
|
||
</P>
|
||
<P>
|
||
A GF grammar consists of <B>modules</B>,
|
||
into which judgements are grouped. The most important
|
||
module forms are
|
||
</P>
|
||
<UL>
|
||
<LI><CODE>abstract</CODE> A = M``, abstract syntax A with judgements in
|
||
the module body M.
|
||
<LI><CODE>concrete</CODE> C <CODE>of</CODE> A = M``, concrete syntax C of the
|
||
abstract syntax A, with judgements in the module body M.
|
||
</UL>
|
||
|
||
<P>
|
||
<h4>Record types, records, and <CODE>Str</CODE>s<h4>
|
||
</P>
|
||
<P>
|
||
The linearization type of a category is a <B>record type</B>, with
|
||
zero of more <B>fields</B> of different types. The simplest record
|
||
type used for linearization in GF is
|
||
</P>
|
||
<PRE>
|
||
{s : Str}
|
||
</PRE>
|
||
<P>
|
||
which has one field, with <B>label</B> <CODE>s</CODE> and type <CODE>Str</CODE>.
|
||
</P>
|
||
<P>
|
||
Examples of records of this type are
|
||
</P>
|
||
<PRE>
|
||
[s = "foo"}
|
||
[s = "hello" ++ "world"}
|
||
</PRE>
|
||
<P>
|
||
The type <CODE>Str</CODE> is really the type of <B>token lists</B>, but
|
||
most of the time one can conveniently think of it as the type of strings,
|
||
denoted by string literals in double quotes.
|
||
</P>
|
||
<P>
|
||
Whenever a record <CODE>r</CODE> of type <CODE>{s : Str}</CODE> is given,
|
||
<CODE>r.s</CODE> is an object of type <CODE>Str</CODE>. This is of course
|
||
a special case of the <B>projection</B> rule, allowing the extraction
|
||
of fields from a record.
|
||
</P>
|
||
<P>
|
||
<h4>An abstract syntax example<h4>
|
||
</P>
|
||
<P>
|
||
Each nonterminal occurring in the grammar <CODE>paleolithic.cf</CODE> is
|
||
introduced by a <CODE>cat</CODE> judgement. Each
|
||
rule label is introduced by a <CODE>fun</CODE> judgement.
|
||
</P>
|
||
<PRE>
|
||
abstract Paleolithic = {
|
||
cat
|
||
S ; NP ; VP ; CN ; A ; V ; TV ;
|
||
fun
|
||
PredVP : NP -> VP -> S ;
|
||
UseV : V -> VP ;
|
||
ComplTV : TV -> NP -> VP ;
|
||
UseA : A -> VP ;
|
||
ModA : A -> CN -> CN ;
|
||
This, That, Def, Indef : CN -> NP ;
|
||
Boy, Louse, Snake, Worm : CN ;
|
||
Green, Rotten, Thick, Warm : A ;
|
||
Laugh, Sleep, Swim : V ;
|
||
Eat, Kill, Wash : TV ;
|
||
}
|
||
</PRE>
|
||
<P>
|
||
Notice the use of shorthands permitting the sharing of
|
||
the keyword in subsequent judgements, and of the type
|
||
in subsequent <CODE>fun</CODE> judgements.
|
||
</P>
|
||
<P>
|
||
<h4>A concrete syntax example<h4>
|
||
</P>
|
||
<P>
|
||
Each category introduced in <CODE>Paleolithic.gf</CODE> is
|
||
given a <CODE>lincat</CODE> rule, and each
|
||
function is given a <CODE>fun</CODE> rule. Similar shorthands
|
||
apply as in <CODE>abstract</CODE> modules.
|
||
</P>
|
||
<PRE>
|
||
concrete PaleolithicEng of Paleolithic = {
|
||
lincat
|
||
S, NP, VP, CN, A, V, TV = {s : Str} ;
|
||
lin
|
||
PredVP np vp = {s = np.s ++ vp.s} ;
|
||
UseV v = v ;
|
||
ComplTV tv np = {s = tv.s ++ np.s} ;
|
||
UseA a = {s = "is" ++ a.s} ;
|
||
This cn = {s = "this" ++ cn.s} ;
|
||
That cn = {s = "that" ++ cn.s} ;
|
||
Def cn = {s = "the" ++ cn.s} ;
|
||
Indef cn = {s = "a" ++ cn.s} ;
|
||
ModA a cn = {s = a.s ++ cn.s} ;
|
||
Boy = {s = "boy"} ;
|
||
Louse = {s = "louse"} ;
|
||
Snake = {s = "snake"} ;
|
||
Worm = {s = "worm"} ;
|
||
Green = {s = "green"} ;
|
||
Rotten = {s = "rotten"} ;
|
||
Thick = {s = "thick"} ;
|
||
Warm = {s = "warm"} ;
|
||
Laugh = {s = "laughs"} ;
|
||
Sleep = {s = "sleeps"} ;
|
||
Swim = {s = "swims"} ;
|
||
Eat = {s = "eats"} ;
|
||
Kill = {s = "kills"} ;
|
||
Wash = {s = "washes"} ;
|
||
}
|
||
</PRE>
|
||
<P></P>
|
||
<P>
|
||
<h4>Modules and files<h4>
|
||
</P>
|
||
<P>
|
||
Module name + <CODE>.gf</CODE> = file name
|
||
</P>
|
||
<P>
|
||
Each module is compiled into a <CODE>.gfc</CODE> file.
|
||
</P>
|
||
<P>
|
||
Import <CODE>PaleolithicEng.gf</CODE> and try what happens
|
||
</P>
|
||
<PRE>
|
||
> i PaleolithicEng.gf
|
||
</PRE>
|
||
<P>
|
||
The GF program does not only read the file
|
||
<CODE>PaleolithicEng.gf</CODE>, but also all other files that it
|
||
depends on - in this case, <CODE>Paleolithic.gf</CODE>.
|
||
</P>
|
||
<P>
|
||
For each file that is compiled, a <CODE>.gfc</CODE> file
|
||
is generated. The GFC format (="GF Canonical") is the
|
||
"machine code" of GF, which is faster to process than
|
||
GF source files. When reading a module, GF knows whether
|
||
to use an existing <CODE>.gfc</CODE> file or to generate
|
||
a new one, by looking at modification times.
|
||
</P>
|
||
<P>
|
||
<h4>Multilingual grammar<h4>
|
||
</P>
|
||
<P>
|
||
The main advantage of separating abstract from concrete syntax is that
|
||
one abstract syntax can be equipped with many concrete syntaxes.
|
||
A system with this property is called a <B>multilingual grammar</B>.
|
||
</P>
|
||
<P>
|
||
Multilingual grammars can be used for applications such as
|
||
translation. Let us buid an Italian concrete syntax for
|
||
<CODE>Paleolithic</CODE> and then test the resulting
|
||
multilingual grammar.
|
||
</P>
|
||
<P>
|
||
<h4>An Italian concrete syntax<h4>
|
||
</P>
|
||
<PRE>
|
||
concrete PaleolithicIta of Paleolithic = {
|
||
lincat
|
||
S, NP, VP, CN, A, V, TV = {s : Str} ;
|
||
lin
|
||
PredVP np vp = {s = np.s ++ vp.s} ;
|
||
UseV v = v ;
|
||
ComplTV tv np = {s = tv.s ++ np.s} ;
|
||
UseA a = {s = "<22>" ++ a.s} ;
|
||
This cn = {s = "questo" ++ cn.s} ;
|
||
That cn = {s = "quello" ++ cn.s} ;
|
||
Def cn = {s = "il" ++ cn.s} ;
|
||
Indef cn = {s = "un" ++ cn.s} ;
|
||
ModA a cn = {s = cn.s ++ a.s} ;
|
||
Boy = {s = "ragazzo"} ;
|
||
Louse = {s = "pidocchio"} ;
|
||
Snake = {s = "serpente"} ;
|
||
Worm = {s = "verme"} ;
|
||
Green = {s = "verde"} ;
|
||
Rotten = {s = "marcio"} ;
|
||
Thick = {s = "grosso"} ;
|
||
Warm = {s = "caldo"} ;
|
||
Laugh = {s = "ride"} ;
|
||
Sleep = {s = "dorme"} ;
|
||
Swim = {s = "nuota"} ;
|
||
Eat = {s = "mangia"} ;
|
||
Kill = {s = "uccide"} ;
|
||
Wash = {s = "lava"} ;
|
||
}
|
||
</PRE>
|
||
<P></P>
|
||
<P>
|
||
<h4>Using a multilingual grammar<h4>
|
||
</P>
|
||
<P>
|
||
Import without first emptying
|
||
</P>
|
||
<PRE>
|
||
> i PaleolithicEng.gf
|
||
> i PaleolithicIta.gf
|
||
</PRE>
|
||
<P>
|
||
Try generation now:
|
||
</P>
|
||
<PRE>
|
||
> gr | l
|
||
un pidocchio uccide questo ragazzo
|
||
|
||
> gr | l -lang=PaleolithicEng
|
||
that louse eats a louse
|
||
</PRE>
|
||
<P>
|
||
Translate by using a pipe:
|
||
</P>
|
||
<PRE>
|
||
> p -lang=PaleolithicEng "the boy eats the snake" | l -lang=PaleolithicIta
|
||
il ragazzo mangia il serpente
|
||
</PRE>
|
||
<P></P>
|
||
<P>
|
||
<h4>Translation quiz<h4>
|
||
</P>
|
||
<P>
|
||
This is a simple language exercise that can be automatically
|
||
generated from a multilingual grammar. The system generates a set of
|
||
random sentence, displays them in one language, and checks the user's
|
||
answer given in another language. The command <CODE>translation_quiz = tq</CODE>
|
||
makes this in a subshell of GF.
|
||
</P>
|
||
<PRE>
|
||
> translation_quiz PaleolithicEng PaleolithicIta
|
||
|
||
Welcome to GF Translation Quiz.
|
||
The quiz is over when you have done at least 10 examples
|
||
with at least 75 % success.
|
||
You can interrupt the quiz by entering a line consisting of a dot ('.').
|
||
|
||
a green boy washes the louse
|
||
un ragazzo verde lava il gatto
|
||
|
||
No, not un ragazzo verde lava il gatto, but
|
||
un ragazzo verde lava il pidocchio
|
||
Score 0/1
|
||
</PRE>
|
||
<P>
|
||
You can also generate a list of translation exercises and save it in a
|
||
file for later use, by the command <CODE>translation_list = tl</CODE>
|
||
</P>
|
||
<PRE>
|
||
> translation_list -number=25 PaleolithicEng PaleolithicIta
|
||
</PRE>
|
||
<P>
|
||
The number flag gives the number of sentences generated.
|
||
</P>
|
||
<P>
|
||
<h4>The multilingual shell state<h4>
|
||
</P>
|
||
<P>
|
||
A GF shell is at any time in a state, which
|
||
contains a multilingual grammar. One of the concrete
|
||
syntaxes is the "main" one, which means that parsing and linearization
|
||
are performed by using it. By default, the main concrete syntax is the
|
||
last-imported one. As we saw on previous slide, the <CODE>lang</CODE> flag
|
||
can be used to change the linearization and parsing grammar.
|
||
</P>
|
||
<P>
|
||
To see what the multilingual grammar is (as well as some other
|
||
things), you can use the command
|
||
<CODE>print_options = po</CODE>:
|
||
</P>
|
||
<PRE>
|
||
> print_options
|
||
main abstract : Paleolithic
|
||
main concrete : PaleolithicIta
|
||
all concretes : PaleolithicIta PaleolithicEng
|
||
</PRE>
|
||
<P></P>
|
||
<P>
|
||
<h4>Extending a grammar<h4>
|
||
</P>
|
||
<P>
|
||
The module system of GF makes it possible to <B>extend</B> a
|
||
grammar in different ways. The syntax of extension is
|
||
shown by the following example.
|
||
</P>
|
||
<PRE>
|
||
abstract Neolithic = Paleolithic ** {
|
||
fun
|
||
Fire, Wheel : CN ;
|
||
Think : V ;
|
||
}
|
||
</PRE>
|
||
<P>
|
||
Parallel to the abstract syntax, extensions can
|
||
be built for concrete syntaxes:
|
||
</P>
|
||
<PRE>
|
||
concrete NeolithicEng of Neolithic = PaleolithicEng ** {
|
||
lin
|
||
Fire = {s = "fire"} ;
|
||
Wheel = {s = "wheel"} ;
|
||
Think = {s = "thinks"} ;
|
||
}
|
||
</PRE>
|
||
<P>
|
||
The effect of extension is that all of the contents of the extended
|
||
and extending module are put together.
|
||
</P>
|
||
<P>
|
||
<h4>Multiple inheritance<h4>
|
||
</P>
|
||
<P>
|
||
Specialized vocabularies can be represented as small grammars that
|
||
only do "one thing" each, e.g.
|
||
</P>
|
||
<PRE>
|
||
abstract Fish = {
|
||
cat Fish ;
|
||
fun Salmon, Perch : Fish ;
|
||
}
|
||
|
||
abstract Mushrooms = {
|
||
cat Mushroom ;
|
||
fun Cep, Agaric : Mushroom ;
|
||
}
|
||
</PRE>
|
||
<P>
|
||
They can afterwards be combined into bigger grammars by using
|
||
<B>multiple inheritance</B>, i.e. extension of several grammars at the
|
||
same time:
|
||
</P>
|
||
<PRE>
|
||
abstract Gatherer = Paleolithic, Fish, Mushrooms ** {
|
||
fun
|
||
UseFish : Fish -> CN ;
|
||
UseMushroom : Mushroom -> CN ;
|
||
}
|
||
</PRE>
|
||
<P></P>
|
||
<P>
|
||
<h4>Visualizing module structure<h4>
|
||
</P>
|
||
<P>
|
||
When you have created all the abstract syntaxes and
|
||
one set of concrete syntaxes needed for <CODE>Gatherer</CODE>,
|
||
your grammar consists of eight GF modules. To see how their
|
||
dependences look like, you can use the command
|
||
<CODE>visualize_graph = vg</CODE>,
|
||
</P>
|
||
<PRE>
|
||
> visualize_graph
|
||
</PRE>
|
||
<P>
|
||
and the graph will pop up in a separate window. It can also
|
||
be printed out into a file, e.g. a <CODE>.gif</CODE> file that
|
||
can be included in an HTML document
|
||
</P>
|
||
<PRE>
|
||
> pm -printer=graph | wf Gatherer.dot
|
||
> ! dot -Tgif Gatherer.dot > Gatherer.gif
|
||
</PRE>
|
||
<P>
|
||
The latter command is a Unix command, issued from GF by using the
|
||
shell escape symbol <CODE>!</CODE>. The resulting graph is shown in the next section.
|
||
</P>
|
||
<P>
|
||
The command <CODE>print_multi = pm</CODE> is used for printing the current multilingual
|
||
grammar in various formats, of which the format <CODE>-printer=graph</CODE> just
|
||
shows the module dependencies.
|
||
</P>
|
||
<P>
|
||
<h4>The module structure of <CODE>GathererEng</CODE><h4>
|
||
</P>
|
||
<P>
|
||
The graph uses
|
||
</P>
|
||
<UL>
|
||
<LI>oval boxes for abstract modules
|
||
<LI>square boxes for concrete modules
|
||
<LI>black-headed arrows for inheritance
|
||
<LI>white-headed arrows for the concrete-of-abstract relation
|
||
</UL>
|
||
|
||
<P>
|
||
<img src="Gatherer.gif">
|
||
</P>
|
||
<A NAME="toc13"></A>
|
||
<H3>Resource modules</H3>
|
||
<P>
|
||
Suppose we want to say, with the vocabulary included in
|
||
<CODE>Paleolithic.gf</CODE>, things like
|
||
</P>
|
||
<PRE>
|
||
the boy eats two snakes
|
||
all boys sleep
|
||
</PRE>
|
||
<P>
|
||
The new grammatical facility we need are the plural forms
|
||
of nouns and verbs (<i>boys, sleep<i>), as opposed to their
|
||
singular forms.
|
||
</P>
|
||
<P>
|
||
The introduction of plural forms requires two things:
|
||
</P>
|
||
<UL>
|
||
<LI>to <B>inflect</B> nouns and verbs in singular and plural number
|
||
<LI>to describe the <B>agreement</B> of the verb to subject: the
|
||
rule that the verb must have the same number as the subject
|
||
</UL>
|
||
|
||
<P>
|
||
Different languages have different rules of inflection and agreement.
|
||
For instance, Italian has also agreement in gender (masculine vs. feminine).
|
||
We want to express such special features of languages precisely in
|
||
concrete syntax while ignoring them in abstract syntax.
|
||
</P>
|
||
<P>
|
||
To be able to do all this, we need two new judgement forms,
|
||
a new module form, and a generalizarion of linearization types
|
||
from strings to more complex types.
|
||
</P>
|
||
<P>
|
||
<h4>Parameters and tables<h4>
|
||
</P>
|
||
<P>
|
||
We define the <B>parameter type</B> of number in Englisn by
|
||
using a new form of judgement:
|
||
</P>
|
||
<PRE>
|
||
param Number = Sg | Pl ;
|
||
</PRE>
|
||
<P>
|
||
To express that nouns in English have a linearization
|
||
depending on number, we replace the linearization type <CODE>{s : Str}</CODE>
|
||
with a type where the <CODE>s</CODE> field is a <B>table</B> depending on number:
|
||
</P>
|
||
<PRE>
|
||
lincat CN = {s : Number => Str} ;
|
||
</PRE>
|
||
<P>
|
||
The <B>table type</B> <CODE>Number => Str</CODE> is in many respects similar to
|
||
a function type (<CODE>Number -> Str</CODE>). The main restriction is that the
|
||
argument type of a table type must always be a parameter type. This means
|
||
that the argument-value pairs can be listed in a finite table. The following
|
||
example shows such a table:
|
||
</P>
|
||
<PRE>
|
||
lin Boy = {s = table {
|
||
Sg => "boy" ;
|
||
Pl => "boys"
|
||
}
|
||
} ;
|
||
</PRE>
|
||
<P>
|
||
The application of a table to a parameter is done by the <B>selection</B>
|
||
operator <CODE>!</CODE>. For instance,
|
||
</P>
|
||
<PRE>
|
||
Boy.s ! Pl
|
||
</PRE>
|
||
<P>
|
||
is a selection, whose value is <CODE>"boys"</CODE>.
|
||
</P>
|
||
<P>
|
||
<h4>Inflection tables, paradigms, and <CODE>oper</CODE> definitions<h4>
|
||
</P>
|
||
<P>
|
||
All English common nouns are inflected in number, most of them in the
|
||
same way: the plural form is formed from the singular form by adding the
|
||
ending <i>s<i>. This rule is an example of
|
||
a <B>paradigm</B> - a formula telling how the inflection
|
||
forms of a word are formed.
|
||
</P>
|
||
<P>
|
||
From GF point of view, a paradigm is a function that takes a <B>lemma</B> -
|
||
a string also known as a <B>dictionary form</B> - and returns an inflection
|
||
table of desired type. Paradigms are not functions in the sense of the
|
||
<CODE>fun</CODE> judgements of abstract syntax (which operate on trees and not
|
||
on strings). Thus we call them <B>operations</B> for the sake of clarity,
|
||
introduce one one form of judgement, with the keyword <CODE>oper</CODE>. As an
|
||
example, the following operation defines the regular noun paradigm of English:
|
||
</P>
|
||
<PRE>
|
||
oper regNoun : Str -> {s : Number => Str} = \x -> {
|
||
s = table {
|
||
Sg => x ;
|
||
Pl => x + "s"
|
||
}
|
||
} ;
|
||
</PRE>
|
||
<P>
|
||
Thus an <CODE>oper</CODE> judgement includes the name of the defined operation,
|
||
its type, and an expression defining it. As for the syntax of the defining
|
||
expression, notice the <B>lambda abstraction</B> form <CODE>\x -> t</CODE> of
|
||
the function, and the <B>glueing</B> operator <CODE>+</CODE> telling that
|
||
the string held in the variable <CODE>x</CODE> and the ending <CODE>"s"</CODE>
|
||
are written together to form one <B>token</B>.
|
||
</P>
|
||
<P>
|
||
<h4>The <CODE>resource</CODE> module type<h4>
|
||
</P>
|
||
<P>
|
||
Parameter and operator definitions do not belong to the abstract syntax.
|
||
They can be used when defining concrete syntax - but they are not
|
||
tied to a particular set of linearization rules.
|
||
The proper way to see them is as auxiliary concepts, as <B>resources</B>
|
||
usable in many concrete syntaxes.
|
||
</P>
|
||
<P>
|
||
The <CODE>resource</CODE> module type thus consists of
|
||
<CODE>param</CODE> and <CODE>oper</CODE> definitions. Here is an
|
||
example.
|
||
</P>
|
||
<PRE>
|
||
resource MorphoEng = {
|
||
param
|
||
Number = Sg | Pl ;
|
||
oper
|
||
Noun : Type = {s : Number => Str} ;
|
||
regNoun : Str -> Noun = \x -> {
|
||
s = table {
|
||
Sg => x ;
|
||
Pl => x + "s"
|
||
}
|
||
} ;
|
||
}
|
||
</PRE>
|
||
<P>
|
||
Resource modules can extend other resource modules, in the
|
||
same way as modules of other types can extend modules of the
|
||
same type.
|
||
</P>
|
||
<A NAME="toc14"></A>
|
||
<H3>Opening a ``resource``</H3>
|
||
<P>
|
||
Any number of <CODE>resource</CODE> modules can be
|
||
<B>opened</B> in a <CODE>concrete</CODE> syntax, which
|
||
makes the parameter and operation definitions contained
|
||
in the resource usable in the concrete syntax. Here is
|
||
an example, where the resource <CODE>MorphoEng</CODE> is
|
||
open in (the fragment of) a new version of <CODE>PaleolithicEng</CODE>.
|
||
</P>
|
||
<PRE>
|
||
concrete PaleolithicEng of Paleolithic = open MorphoEng in {
|
||
lincat
|
||
CN = Noun ;
|
||
lin
|
||
Boy = regNoun "boy" ;
|
||
Snake = regNoun "snake" ;
|
||
Worm = regNoun "worm" ;
|
||
}
|
||
</PRE>
|
||
<P>
|
||
Notice that, just like in abstract syntax, function application
|
||
is written by juxtaposition of the function and the argument.
|
||
</P>
|
||
<P>
|
||
Using operations defined in resource modules is clearly a concise
|
||
way of giving e.g. inflection tables and other repeated patterns
|
||
of expression. In addition, it enables a new kind of modularity
|
||
and division of labour in grammar writing: grammarians familiar with
|
||
the linguistic details of a language can put this knowledge
|
||
available through resource grammars, whose users only need
|
||
to pick the right operations and not to know their implementation
|
||
details.
|
||
</P>
|
||
<P>
|
||
<h4>Worst-case macros and data abstraction<h4>
|
||
</P>
|
||
<P>
|
||
Some English nouns, such as <CODE>louse</CODE>, are so irregular that
|
||
it makes little sense to see them as instances of a paradigm. Even
|
||
then, it is useful to perform <B>data abstraction</B> from the
|
||
definition of the type <CODE>Noun</CODE>, and introduce a constructor
|
||
operation, a <B>worst-case macro</B> for nouns:
|
||
</P>
|
||
<PRE>
|
||
oper mkNoun : Str -> Str -> Noun = \x,y -> {
|
||
s = table {
|
||
Sg => x ;
|
||
Pl => y
|
||
}
|
||
} ;
|
||
</PRE>
|
||
<P>
|
||
Thus we define
|
||
</P>
|
||
<PRE>
|
||
lin Louse = mkNoun "louse" "lice" ;
|
||
</PRE>
|
||
<P>
|
||
instead of writing the inflection table explicitly.
|
||
</P>
|
||
<P>
|
||
The grammar engineering advantage of worst-case macros is that
|
||
the author of the resource module may change the definitions of
|
||
<CODE>Noun</CODE> and <CODE>mkNoun</CODE>, and still retain the
|
||
interface (i.e. the system of type signatures) that makes it
|
||
correct to use these functions in concrete modules. In programming
|
||
terms, <CODE>Noun</CODE> is then treated as an <B>abstract datatype</B>.
|
||
</P>
|
||
<P>
|
||
<h4>A system of paradigms using <CODE>Prelude</CODE> operations<h4>
|
||
</P>
|
||
<P>
|
||
The regular noun paradigm <CODE>regNoun</CODE> can - and should - of course be defined
|
||
by the worst-case macro <CODE>mkNoun</CODE>. In addition, some more noun paradigms
|
||
could be defined, for instance,
|
||
</P>
|
||
<PRE>
|
||
regNoun : Str -> Noun = \snake -> mkNoun snake (snake + "s") ;
|
||
sNoun : Str -> Noun = \kiss -> mkNoun kiss (kiss + "es") ;
|
||
</PRE>
|
||
<P>
|
||
What about nouns like <i>fly<i>, with the plural <i>flies<i>? The already
|
||
available solution is to use the so-called "technical stem" <i>fl<i> as
|
||
argument, and define
|
||
</P>
|
||
<PRE>
|
||
yNoun : Str -> Noun = \fl -> mkNoun (fl + "y") (fl + "ies") ;
|
||
</PRE>
|
||
<P>
|
||
But this paradigm would be very unintuitive to use, because the "technical stem"
|
||
is not even an existing form of the word. A better solution is to use
|
||
the string operator <CODE>init</CODE>, which returns the initial segment (i.e.
|
||
all characters but the last) of a string:
|
||
</P>
|
||
<PRE>
|
||
yNoun : Str -> Noun = \fly -> mkNoun fly (init fly + "ies") ;
|
||
</PRE>
|
||
<P>
|
||
The operator <CODE>init</CODE> belongs to a set of operations in the
|
||
resource module <CODE>Prelude</CODE>, which therefore has to be
|
||
<CODE>open</CODE>ed so that <CODE>init</CODE> can be used.
|
||
</P>
|
||
<P>
|
||
<h4>An intelligent noun paradigm using <CODE>case</CODE> expressions<h4>
|
||
</P>
|
||
<P>
|
||
It may be hard for the user of a resource morphology to pick the right
|
||
inflection paradigm. A way to help this is to define a more intelligent
|
||
paradigms, which chooses the ending by first analysing the lemma.
|
||
The following variant for English regular nouns puts together all the
|
||
previously shown paradigms, and chooses one of them on the basis of
|
||
the final letter of the lemma.
|
||
</P>
|
||
<PRE>
|
||
regNoun : Str -> Noun = \s -> case last s of {
|
||
"s" | "z" => mkNoun s (s + "es") ;
|
||
"y" => mkNoun s (init s + "ies") ;
|
||
_ => mkNoun s (s + "s")
|
||
} ;
|
||
</PRE>
|
||
<P>
|
||
This definition displays many GF expression forms not shown befores;
|
||
these forms are explained in the following section.
|
||
</P>
|
||
<P>
|
||
The paradigms <CODE>regNoun</CODE> does not give the correct forms for
|
||
all nouns. For instance, <i>louse - lice<i> and
|
||
<i>fish - fish<i> must be given by using <CODE>mkNoun</CODE>.
|
||
Also the word <i>boy<i> would be inflected incorrectly; to prevent
|
||
this, either use <CODE>mkNoun</CODE> or modify
|
||
<CODE>regNoun</CODE> so that the <CODE>"y"</CODE> case does not
|
||
apply if the second-last character is a vowel.
|
||
</P>
|
||
<P>
|
||
<h4>Pattern matching<h4>
|
||
</P>
|
||
<P>
|
||
Expressions of the <CODE>table</CODE> form are built from lists of
|
||
argument-value pairs. These pairs are called the <B>branches</B>
|
||
of the table. In addition to constants introduced in
|
||
<CODE>param</CODE> definitions, the left-hand side of a branch can more
|
||
generally be a <B>pattern</B>, and the computation of selection is
|
||
then performed by <B>pattern matching</B>:
|
||
</P>
|
||
<UL>
|
||
<LI>a variable pattern (identifier other than constant parameter) matches anything
|
||
<LI>the wild card <CODE>_</CODE> matches anything
|
||
<LI>a string literal pattern, e.g. <CODE>"s"</CODE>, matches the same string
|
||
<LI>a disjunctive pattern <CODE>P | ... | Q</CODE> matches anything that
|
||
one of the disjuncts matches
|
||
</UL>
|
||
|
||
<P>
|
||
Pattern matching is performed in the order in which the branches
|
||
appear in the table.
|
||
</P>
|
||
<P>
|
||
As syntactic sugar, one-branch tables can be written concisely,
|
||
</P>
|
||
<PRE>
|
||
\\P,...,Q => t === table {P => ... table {Q => t} ...}
|
||
</PRE>
|
||
<P>
|
||
Finally, the <CODE>case</CODE> expressions common in functional
|
||
programming languages are syntactic sugar for table selections:
|
||
</P>
|
||
<PRE>
|
||
case e of {...} === table {...} ! e
|
||
</PRE>
|
||
<P></P>
|
||
<P>
|
||
<h4>Morphological analysis and morphology quiz<h4>
|
||
</P>
|
||
<P>
|
||
Even though in GF morphology
|
||
is mostly seen as an auxiliary of syntax, a morphology once defined
|
||
can be used on its own right. The command <CODE>morpho_analyse = ma</CODE>
|
||
can be used to read a text and return for each word the analyses that
|
||
it has in the current concrete syntax.
|
||
</P>
|
||
<PRE>
|
||
> rf bible.txt | morpho_analyse
|
||
</PRE>
|
||
<P>
|
||
Similarly to translation exercises, morphological exercises can
|
||
be generated, by the command <CODE>morpho_quiz = mq</CODE>. Usually,
|
||
the category is set to be something else than <CODE>S</CODE>. For instance,
|
||
</P>
|
||
<PRE>
|
||
> i lib/resource/french/VerbsFre.gf
|
||
> morpho_quiz -cat=V
|
||
|
||
Welcome to GF Morphology Quiz.
|
||
...
|
||
|
||
r<>appara<72>tre : VFin VCondit Pl P2
|
||
r<>apparaitriez
|
||
> No, not r<>apparaitriez, but
|
||
r<>appara<72>triez
|
||
Score 0/1
|
||
</PRE>
|
||
<P>
|
||
Finally, a list of morphological exercises and save it in a
|
||
file for later use, by the command <CODE>morpho_list = ml</CODE>
|
||
</P>
|
||
<PRE>
|
||
> morpho_list -number=25 -cat=V
|
||
</PRE>
|
||
<P>
|
||
The number flag gives the number of exercises generated.
|
||
</P>
|
||
<P>
|
||
<h4>Parametric vs. inherent features, agreement<h4>
|
||
</P>
|
||
<P>
|
||
The rule of subject-verb agreement in English says that the verb
|
||
phrase must be inflected in the number of the subject. This
|
||
means that a noun phrase (functioning as a subject), in some sense
|
||
<i>has<i> a number, which it "sends" to the verb. The verb does not
|
||
have a number, but must be able to receive whatever number the
|
||
subject has. This distinction is nicely represented by the
|
||
different linearization types of noun phrases and verb phrases:
|
||
</P>
|
||
<PRE>
|
||
lincat NP = {s : Str ; n : Number} ;
|
||
lincat VP = {s : Number => Str} ;
|
||
</PRE>
|
||
<P>
|
||
We say that the number of <CODE>NP</CODE> is an <B>inherent feature</B>,
|
||
whereas the number of <CODE>NP</CODE> is <B>parametric</B>.
|
||
</P>
|
||
<P>
|
||
The agreement rule itself is expressed in the linearization rule of
|
||
the predication structure:
|
||
</P>
|
||
<PRE>
|
||
lin PredVP np vp = {s = np.s ++ vp.s ! np.n} ;
|
||
</PRE>
|
||
<P>
|
||
The following page will present a new version of
|
||
<CODE>PaleolithingEng</CODE>, assuming an abstract syntax
|
||
xextended with <CODE>All</CODE> and <CODE>Two</CODE>.
|
||
It also assumes that <CODE>MorphoEng</CODE> has a paradigm
|
||
<CODE>regVerb</CODE> for regular verbs (which need only be
|
||
regular only in the present tensse).
|
||
The reader is invited to inspect the way in which agreement works in
|
||
the formation of noun phrases and verb phrases.
|
||
</P>
|
||
<P>
|
||
<h4>English concrete syntax with parameters<h4>
|
||
</P>
|
||
<PRE>
|
||
concrete PaleolithicEng of Paleolithic = open MorphoEng in {
|
||
lincat
|
||
S, A = {s : Str} ;
|
||
VP, CN, V, TV = {s : Number => Str} ;
|
||
NP = {s : Str ; n : Number} ;
|
||
lin
|
||
PredVP np vp = {s = np.s ++ vp.s ! np.n} ;
|
||
UseV v = v ;
|
||
ComplTV tv np = {s = \\n => tv.s ! n ++ np.s} ;
|
||
UseA a = {s = \\n => case n of {Sg => "is" ; Pl => "are"} ++ a.s} ;
|
||
This cn = {s = "this" ++ cn.s ! Sg } ;
|
||
Indef cn = {s = "a" ++ cn.s ! Sg} ;
|
||
All cn = {s = "all" ++ cn.s ! Pl} ;
|
||
Two cn = {s = "two" ++ cn.s ! Pl} ;
|
||
ModA a cn = {s = \\n => a.s ++ cn.s ! n} ;
|
||
Louse = mkNoun "louse" "lice" ;
|
||
Snake = regNoun "snake" ;
|
||
Green = {s = "green"} ;
|
||
Warm = {s = "warm"} ;
|
||
Laugh = regVerb "laugh" ;
|
||
Sleep = regVerb "sleep" ;
|
||
Kill = regVerb "kill" ;
|
||
}
|
||
</PRE>
|
||
<P></P>
|
||
<P>
|
||
<h4>Hierarchic parameter types<h4>
|
||
</P>
|
||
<P>
|
||
The reader familiar with a functional programming language such as
|
||
<a href="<A HREF="http://www.haskell.org">http://www.haskell.org</A>">Haskell<a> must have noticed the similarity
|
||
between parameter types in GF and algebraic datatypes (<CODE>data</CODE> definitions
|
||
in Haskell). The GF parameter types are actually a special case of algebraic
|
||
datatypes: the main restriction is that in GF, these types must be finite.
|
||
(This restriction makes it possible to invert linearization rules into
|
||
parsing methods.)
|
||
</P>
|
||
<P>
|
||
However, finite is not the same thing as enumerated. Even in GF, parameter
|
||
constructors can take arguments, provided these arguments are from other
|
||
parameter types (recursion is forbidden). Such parameter types impose a
|
||
hierarchic order among parameters. They are often useful to define
|
||
linguistically accurate parameter systems.
|
||
</P>
|
||
<P>
|
||
To give an example, Swedish adjectives
|
||
are inflected in number (singular or plural) and
|
||
gender (uter or neuter). These parameters would suggest 2*2=4 different
|
||
forms. However, the gender distinction is done only in the singular. Therefore,
|
||
it would be inaccurate to define adjective paradigms using the type
|
||
<CODE>Gender => Number => Str</CODE>. The following hierarchic definition
|
||
yields an accurate system of three adjectival forms.
|
||
</P>
|
||
<PRE>
|
||
param AdjForm = ASg Gender | APl ;
|
||
param Gender = Uter | Neuter ;
|
||
</PRE>
|
||
<P>
|
||
In pattern matching, a constructor can have patterns as arguments. For instance,
|
||
the adjectival paradigm in which the two singular forms are the same, can be defined
|
||
</P>
|
||
<PRE>
|
||
oper plattAdj : Str -> AdjForm => Str = \x -> table {
|
||
ASg _ => x ;
|
||
APl => x + "a" ;
|
||
}
|
||
</PRE>
|
||
<P></P>
|
||
<P>
|
||
<h4>Discontinuous constituents<h4>
|
||
</P>
|
||
<P>
|
||
A linearization type may contain more strings than one.
|
||
An example of where this is useful are English particle
|
||
verbs, such as <i>switch off<i>. The linearization of
|
||
a sentence may place the object between the verb and the particle:
|
||
<i>he switched it off<i>.
|
||
</P>
|
||
<P>
|
||
The first of the following judgements defines transitive verbs as a
|
||
<B>discontinuous constituents</B>, i.e. as having a linearization
|
||
type with two strings and not just one. The second judgement
|
||
shows how the constituents are separated by the object in complementization.
|
||
</P>
|
||
<PRE>
|
||
lincat TV = {s : Number => Str ; s2 : Str} ;
|
||
lin ComplTV tv obj = {s = \\n => tv.s ! n ++ obj.s ++ tv.s2} ;
|
||
</PRE>
|
||
<P></P>
|
||
<P>
|
||
GF currently requires that all fields in linearization records that
|
||
have a table with value type <CODE>Str</CODE> have as labels
|
||
either <CODE>s</CODE> or <CODE>s</CODE> with an integer index.
|
||
</P>
|
||
<A NAME="toc15"></A>
|
||
<H2>Topics still to be written</H2>
|
||
<P>
|
||
Free variation
|
||
</P>
|
||
<P>
|
||
Record extension, tuples
|
||
</P>
|
||
<P>
|
||
Predefined types and operations
|
||
</P>
|
||
<P>
|
||
Lexers and unlexers
|
||
</P>
|
||
<P>
|
||
Grammars of formal languages
|
||
</P>
|
||
<P>
|
||
Resource grammars and their reuse
|
||
</P>
|
||
<P>
|
||
Embedded grammars in Haskell and Java
|
||
</P>
|
||
<P>
|
||
Dependent types, variable bindings, semantic definitions
|
||
</P>
|
||
<P>
|
||
Transfer rules
|
||
</P>
|
||
|
||
<!-- html code generated by txt2tags 2.0 (http://txt2tags.sf.net) -->
|
||
<!-- cmdline: txt2tags -\-toc gf-tutorial2.txt -->
|
||
</BODY></HTML>
|