forked from GitHub/gf-core
5443 lines
126 KiB
HTML
5443 lines
126 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<META NAME="generator" CONTENT="http://txt2tags.sf.net">
|
|
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
|
|
<TITLE>Grammatical Framework Tutorial</TITLE>
|
|
</HEAD><BODY BGCOLOR="white" TEXT="black">
|
|
<P ALIGN="center"><CENTER><H1>Grammatical Framework Tutorial</H1>
|
|
<FONT SIZE="4">
|
|
<I>Aarne Ranta</I><BR>
|
|
December 2010 (November 2008)
|
|
</FONT></CENTER>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H1>Overview</H1>
|
|
<P>
|
|
This is a hands-on introduction to grammar writing in GF.
|
|
</P>
|
|
<P>
|
|
Main ingredients of GF:
|
|
</P>
|
|
<UL>
|
|
<LI>linguistics
|
|
<LI>functional programming
|
|
</UL>
|
|
|
|
<P>
|
|
Prerequisites:
|
|
</P>
|
|
<UL>
|
|
<LI>some previous experience from some programming language
|
|
<LI>the basics of using computers, e.g. the use of
|
|
text editors and the management of files.
|
|
<LI>knowledge of Unix commands is useful but not necessary
|
|
<LI>knowledge of many natural languages may add fun to experience
|
|
</UL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Outline</H2>
|
|
<P>
|
|
<a href="#chaptwo">Lesson 1</a>: a multilingual "Hello World" grammar. English, Finnish, Italian.
|
|
</P>
|
|
<P>
|
|
<a href="#chapthree">Lesson 2</a>: a larger grammar for the domain of food. English and Italian.
|
|
</P>
|
|
<P>
|
|
<a href="#chaptwo">Lesson 3</a>: parameters - morphology and agreement.
|
|
</P>
|
|
<P>
|
|
<a href="#chapfive">Lesson 4</a>: using the resource grammar library.
|
|
</P>
|
|
<P>
|
|
<a href="#chapsix">Lesson 5</a>: semantics - <B>dependent types</B>, <B>variable bindings</B>,
|
|
and <B>semantic definitions</B>.
|
|
</P>
|
|
<P>
|
|
<a href="#chapseven">Lesson 6</a>: implementing formal languages.
|
|
</P>
|
|
<P>
|
|
<a href="#chapeight">Lesson 7</a>: embedded grammar applications.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Slides</H2>
|
|
<P>
|
|
You can chop this tutorial into a set of slides by the command
|
|
</P>
|
|
<PRE>
|
|
htmls gf-tutorial.html
|
|
</PRE>
|
|
<P>
|
|
where the program <CODE>htmls</CODE> is distributed with GF (see below), in
|
|
</P>
|
|
<P>
|
|
<A HREF="http://grammaticalframework.org/src/tools/Htmls.hs"><CODE>GF/src/tools/Htmls.hs</CODE></A>
|
|
</P>
|
|
<P>
|
|
The slides will appear as a set of files beginning with <CODE>01-gf-tutorial.htmls</CODE>.
|
|
</P>
|
|
<P>
|
|
Internal links will not work in the slide format, except for those in the
|
|
upper left corner of each slide, and the links behind the "Contents" link.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H1>Lesson 1: Getting Started with GF</H1>
|
|
<P>
|
|
<a name="chaptwo"></a>
|
|
</P>
|
|
<P>
|
|
Goals:
|
|
</P>
|
|
<UL>
|
|
<LI>install and run GF
|
|
<LI>write the first GF grammar: a "Hello World" grammar in three languages
|
|
<LI>use GF for translation and multilingual generation
|
|
</UL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>What GF is</H2>
|
|
<P>
|
|
We use the term GF for three different things:
|
|
</P>
|
|
<UL>
|
|
<LI>a <B>system</B> (computer program) used for working with grammars
|
|
<LI>a <B>programming language</B> in which grammars can be written
|
|
<LI>a <B>theory</B> about grammars and languages
|
|
</UL>
|
|
|
|
<P>
|
|
The GF system is an implementation
|
|
of the GF programming language, which in turn is built on the ideas of the
|
|
GF theory.
|
|
</P>
|
|
<P>
|
|
The focus of this tutorial is on using the GF programming language.
|
|
</P>
|
|
<P>
|
|
At the same time, we learn the way of thinking in the GF theory.
|
|
</P>
|
|
<P>
|
|
We make the grammars run on a computer by
|
|
using the GF system.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>GF grammars and language processing tasks</H2>
|
|
<P>
|
|
A GF program is called a <B>grammar</B>.
|
|
</P>
|
|
<P>
|
|
A grammar defines a language.
|
|
</P>
|
|
<P>
|
|
From this definition, language processing components can be derived:
|
|
</P>
|
|
<UL>
|
|
<LI><B>parsing</B>: to analyse the language
|
|
<LI><B>linearization</B>: to generate the language
|
|
<LI><B>translation</B>: to analyse one language and generate another
|
|
</UL>
|
|
|
|
<P>
|
|
In general, a GF grammar is <B>multilingual</B>:
|
|
</P>
|
|
<UL>
|
|
<LI>many languages in one grammar
|
|
<LI>translations between them
|
|
</UL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Getting the GF system</H2>
|
|
<P>
|
|
Open-source free software, downloaded via the GF Homepage:
|
|
</P>
|
|
<P>
|
|
<A HREF="http://grammaticalframework.org/"><CODE>grammaticalframework.org</CODE></A>
|
|
</P>
|
|
<P>
|
|
There you find
|
|
</P>
|
|
<UL>
|
|
<LI>binaries for Linux, Mac OS X, and Windows
|
|
<LI>source code and documentation
|
|
<LI>grammar libraries and examples
|
|
</UL>
|
|
|
|
<P>
|
|
Many examples in this tutorial are
|
|
<A HREF="http://grammaticalframework.org/examples/tutorial">online</A>.
|
|
</P>
|
|
<P>
|
|
Normally you don't have to compile GF yourself.
|
|
But, if you do want to compile GF from source follow the
|
|
instructions in the <A HREF="../gf-developers.html">Developers Guide</A>.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Running the GF system</H2>
|
|
<P>
|
|
Type <CODE>gf</CODE> in the Unix (or Cygwin) shell:
|
|
</P>
|
|
<PRE>
|
|
% gf
|
|
</PRE>
|
|
<P>
|
|
You will see GF's welcome message and the prompt <CODE>></CODE>.
|
|
The command
|
|
</P>
|
|
<PRE>
|
|
> help
|
|
</PRE>
|
|
<P>
|
|
will give you a list of available commands.
|
|
</P>
|
|
<P>
|
|
As a common convention, we will use
|
|
</P>
|
|
<UL>
|
|
<LI><CODE>%</CODE> as a prompt that marks system commands
|
|
<LI><CODE>></CODE> as a prompt that marks GF commands
|
|
</UL>
|
|
|
|
<P>
|
|
Thus you should not type these prompts, but only the characters that
|
|
follow them.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>A "Hello World" grammar</H2>
|
|
<P>
|
|
Like most programming language tutorials, we start with a
|
|
program that prints "Hello World" on the terminal.
|
|
</P>
|
|
<P>
|
|
Extra features:
|
|
</P>
|
|
<UL>
|
|
<LI><B>Multilinguality</B>: the message is printed in many languages.
|
|
<LI><B>Reversibility</B>: in addition to printing, you can <B>parse</B> the
|
|
message and <B>translate</B> it to other languages.
|
|
</UL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>The program: abstract syntax and concrete syntaxes</H3>
|
|
<P>
|
|
A GF program, in general, is a <B>multilingual grammar</B>. Its main parts
|
|
are
|
|
</P>
|
|
<UL>
|
|
<LI>an <B>abstract syntax</B>
|
|
<LI>one or more <B>concrete syntaxes</B>
|
|
</UL>
|
|
|
|
<P>
|
|
The abstract syntax defines what <B>meanings</B>
|
|
can be expressed in the grammar
|
|
</P>
|
|
<UL>
|
|
<LI><I>Greetings</I>, where we greet a <I>Recipient</I>, which can be
|
|
<I>World</I> or <I>Mum</I> or <I>Friends</I>
|
|
</UL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
GF code for the abstract syntax:
|
|
</P>
|
|
<PRE>
|
|
-- a "Hello World" grammar
|
|
abstract Hello = {
|
|
|
|
flags startcat = Greeting ;
|
|
|
|
cat Greeting ; Recipient ;
|
|
|
|
fun
|
|
Hello : Recipient -> Greeting ;
|
|
World, Mum, Friends : Recipient ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
The code has the following parts:
|
|
</P>
|
|
<UL>
|
|
<LI>a <B>comment</B> (optional), saying what the module is doing
|
|
<LI>a <B>module header</B> indicating that it is an abstract syntax
|
|
module named <CODE>Hello</CODE>
|
|
<LI>a <B>module body</B> in braces, consisting of
|
|
<UL>
|
|
<LI>a <B>startcat flag declaration</B> stating that <CODE>Greeting</CODE> is the
|
|
default start category for parsing and generation
|
|
<LI><B>category declarations</B> introducing two categories, i.e. types of meanings
|
|
<LI><B>function declarations</B> introducing three meaning-building functions
|
|
</UL>
|
|
</UL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
English concrete syntax (mapping from meanings to strings):
|
|
</P>
|
|
<PRE>
|
|
concrete HelloEng of Hello = {
|
|
|
|
lincat Greeting, Recipient = {s : Str} ;
|
|
|
|
lin
|
|
Hello recip = {s = "hello" ++ recip.s} ;
|
|
World = {s = "world"} ;
|
|
Mum = {s = "mum"} ;
|
|
Friends = {s = "friends"} ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
The major parts of this code are:
|
|
</P>
|
|
<UL>
|
|
<LI>a module header indicating that it is a concrete syntax of the abstract syntax
|
|
<CODE>Hello</CODE>, itself named <CODE>HelloEng</CODE>
|
|
<LI>a module body in curly brackets, consisting of
|
|
<UL>
|
|
<LI><B>linearization type definitions</B> stating that
|
|
<CODE>Greeting</CODE> and <CODE>Recipient</CODE> are <B>records</B> with a <B>string</B> <CODE>s</CODE>
|
|
<LI><B>linearization definitions</B> telling what records are assigned to
|
|
each of the meanings defined in the abstract syntax
|
|
</UL>
|
|
</UL>
|
|
|
|
<P>
|
|
Notice the concatenation <CODE>++</CODE> and the record projection <CODE>.</CODE>.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
Finnish and an Italian concrete syntaxes:
|
|
</P>
|
|
<PRE>
|
|
concrete HelloFin of Hello = {
|
|
lincat Greeting, Recipient = {s : Str} ;
|
|
lin
|
|
Hello recip = {s = "terve" ++ recip.s} ;
|
|
World = {s = "maailma"} ;
|
|
Mum = {s = "äiti"} ;
|
|
Friends = {s = "ystävät"} ;
|
|
}
|
|
|
|
concrete HelloIta of Hello = {
|
|
lincat Greeting, Recipient = {s : Str} ;
|
|
lin
|
|
Hello recip = {s = "ciao" ++ recip.s} ;
|
|
World = {s = "mondo"} ;
|
|
Mum = {s = "mamma"} ;
|
|
Friends = {s = "amici"} ;
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Using grammars in the GF system</H3>
|
|
<P>
|
|
In order to compile the grammar in GF,
|
|
we create four files, one for each module, named <I>Modulename</I><CODE>.gf</CODE>:
|
|
</P>
|
|
<PRE>
|
|
Hello.gf HelloEng.gf HelloFin.gf HelloIta.gf
|
|
</PRE>
|
|
<P>
|
|
The first GF command: <B>import</B> a grammar.
|
|
</P>
|
|
<PRE>
|
|
> import HelloEng.gf
|
|
</PRE>
|
|
<P>
|
|
All commands also have short names; here:
|
|
</P>
|
|
<PRE>
|
|
> i HelloEng.gf
|
|
</PRE>
|
|
<P>
|
|
The GF system will <B>compile</B> your grammar
|
|
into an internal representation and show the CPU time was consumed, followed
|
|
by a new prompt:
|
|
</P>
|
|
<PRE>
|
|
> i HelloEng.gf
|
|
- compiling Hello.gf... wrote file Hello.gfo 8 msec
|
|
- compiling HelloEng.gf... wrote file HelloEng.gfo 12 msec
|
|
|
|
12 msec
|
|
>
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
You can use GF for <B>parsing</B> (<CODE>parse</CODE> = <CODE>p</CODE>)
|
|
</P>
|
|
<PRE>
|
|
> parse "hello world"
|
|
Hello World
|
|
</PRE>
|
|
<P>
|
|
Parsing takes a <B>string</B> into an <B>abstract syntax tree</B>.
|
|
</P>
|
|
<P>
|
|
The notation for trees is that of <B>function application</B>:
|
|
</P>
|
|
<PRE>
|
|
function argument1 ... argumentn
|
|
</PRE>
|
|
<P>
|
|
Parentheses are only needed for grouping.
|
|
</P>
|
|
<P>
|
|
Parsing something that is not in grammar will fail:
|
|
</P>
|
|
<PRE>
|
|
> parse "hello dad"
|
|
Unknown words: dad
|
|
|
|
> parse "world hello"
|
|
no tree found
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
You can also use GF for <B>linearization</B> (<CODE>linearize = l</CODE>).
|
|
It takes trees into strings:
|
|
</P>
|
|
<PRE>
|
|
> linearize Hello World
|
|
hello world
|
|
</PRE>
|
|
<P>
|
|
<B>Translation</B>: <B>pipe</B> linearization to parsing:
|
|
</P>
|
|
<PRE>
|
|
> import HelloEng.gf
|
|
> import HelloIta.gf
|
|
|
|
> parse -lang=HelloEng "hello mum" | linearize -lang=HelloIta
|
|
ciao mamma
|
|
</PRE>
|
|
<P>
|
|
Default of the language flag (<CODE>-lang</CODE>): the last-imported concrete syntax.
|
|
</P>
|
|
<P>
|
|
<B>Multilingual generation</B>:
|
|
</P>
|
|
<PRE>
|
|
> parse -lang=HelloEng "hello friends" | linearize
|
|
terve ystävät
|
|
ciao amici
|
|
hello friends
|
|
</PRE>
|
|
<P>
|
|
Linearization is by default to all available languages.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Exercises on the Hello World grammar</H3>
|
|
<OL>
|
|
<LI>Test the parsing and translation examples shown above, as well as
|
|
some other examples, in different combinations of languages.
|
|
<P></P>
|
|
<LI>Extend the grammar <CODE>Hello.gf</CODE> and some of the
|
|
concrete syntaxes by five new recipients and one new greeting
|
|
form.
|
|
<P></P>
|
|
<LI>Add a concrete syntax for some other
|
|
languages you might know.
|
|
<P></P>
|
|
<LI>Add a pair of greetings that are expressed in one and
|
|
the same way in
|
|
one language and in two different ways in another.
|
|
For instance, <I>good morning</I>
|
|
and <I>good afternoon</I> in English are both expressed
|
|
as <I>buongiorno</I> in Italian.
|
|
Test what happens when you translate <I>buongiorno</I> to English in GF.
|
|
<P></P>
|
|
<LI>Inject errors in the <CODE>Hello</CODE> grammars, for example, leave out
|
|
some line, omit a variable in a <CODE>lin</CODE> rule, or change the name
|
|
in one occurrence
|
|
of a variable. Inspect the error messages generated by GF.
|
|
</OL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Using grammars from outside GF</H2>
|
|
<P>
|
|
You can use the <CODE>gf</CODE> program in a Unix pipe.
|
|
</P>
|
|
<UL>
|
|
<LI>echo a GF command
|
|
<LI>pipe it into GF with grammar names as arguments
|
|
</UL>
|
|
|
|
<PRE>
|
|
% echo "l Hello World" | gf HelloEng.gf HelloFin.gf HelloIta.gf
|
|
</PRE>
|
|
<P>
|
|
You can also write a <B>script</B>, a file containing the lines
|
|
</P>
|
|
<PRE>
|
|
import HelloEng.gf
|
|
import HelloFin.gf
|
|
import HelloIta.gf
|
|
linearize Hello World
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>GF scripts</H2>
|
|
<P>
|
|
If we name this script <CODE>hello.gfs</CODE>, we can do
|
|
</P>
|
|
<PRE>
|
|
$ gf --run <hello.gfs
|
|
|
|
ciao mondo
|
|
terve maailma
|
|
hello world
|
|
</PRE>
|
|
<P>
|
|
The option <CODE>--run</CODE> removes prompts, CPU time, and other messages.
|
|
</P>
|
|
<P>
|
|
See <a href="#chapeight">Lesson 7</a>, for stand-alone programs that don't need the GF system to run.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. (For Unix hackers.) Write a GF application that reads
|
|
an English string from the standard input and writes an Italian
|
|
translation to the output.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>What else can be done with the grammar</H2>
|
|
<P>
|
|
Some more functions that will be covered:
|
|
</P>
|
|
<UL>
|
|
<LI><B>morphological analysis</B>: find out the possible inflection forms of words
|
|
<LI><B>morphological synthesis</B>: generate all inflection forms of words
|
|
<LI><B>random generation</B>: generate random expressions
|
|
<LI><B>corpus generation</B>: generate all expressions
|
|
<LI><B>treebank generation</B>: generate a list of trees with their linearizations
|
|
<LI><B>teaching quizzes</B>: train morphology and translation
|
|
<LI><B>multilingual authoring</B>: create a document in many languages simultaneously
|
|
<LI><B>speech input</B>: optimize a speech recognition system for a grammar
|
|
</UL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Embedded grammar applications</H2>
|
|
<P>
|
|
Application programs, using techniques from <a href="#chapeight">Lesson 7</a>:
|
|
</P>
|
|
<UL>
|
|
<LI>compile grammars to new formats, such as speech recognition grammars
|
|
<LI>embed grammars in Java and Haskell programs
|
|
<LI>build applications using compilation and embedding:
|
|
<UL>
|
|
<LI>voice commands
|
|
<LI>spoken language translators
|
|
<LI>dialogue systems
|
|
<LI>user interfaces
|
|
<LI>localization: render the messages printed by a program
|
|
in different languages
|
|
</UL>
|
|
</UL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H1>Lesson 2: Designing a grammar for complex phrases</H1>
|
|
<P>
|
|
<a name="chapthree"></a>
|
|
</P>
|
|
<P>
|
|
Goals:
|
|
</P>
|
|
<UL>
|
|
<LI>build a larger grammar: phrases about food in English and Italian
|
|
<LI>learn to write reusable library functions ("operations")
|
|
<LI>learn the basics of GF's module system
|
|
</UL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>The abstract syntax Food</H2>
|
|
<P>
|
|
Phrases usable for speaking about food:
|
|
</P>
|
|
<UL>
|
|
<LI>the start category is <CODE>Phrase</CODE>
|
|
<LI>a <CODE>Phrase</CODE> can be built by assigning a <CODE>Quality</CODE> to an <CODE>Item</CODE>
|
|
(e.g. <I>this cheese is Italian</I>)
|
|
<LI>an<CODE>Item</CODE> is build from a <CODE>Kind</CODE> by prefixing <I>this</I> or <I>that</I>
|
|
(e.g. <I>this wine</I>)
|
|
<LI>a <CODE>Kind</CODE> is either <B>atomic</B> (e.g. <I>cheese</I>), or formed
|
|
qualifying a given <CODE>Kind</CODE> with a <CODE>Quality</CODE> (e.g. <I>Italian cheese</I>)
|
|
<LI>a <CODE>Quality</CODE> is either atomic (e.g. <I>Italian</I>,
|
|
or built by modifying a given <CODE>Quality</CODE> with the word <I>very</I> (e.g. <I>very warm</I>)
|
|
</UL>
|
|
|
|
<P>
|
|
Abstract syntax:
|
|
</P>
|
|
<PRE>
|
|
abstract Food = {
|
|
|
|
flags startcat = Phrase ;
|
|
|
|
cat
|
|
Phrase ; Item ; Kind ; Quality ;
|
|
|
|
fun
|
|
Is : Item -> Quality -> Phrase ;
|
|
This, That : Kind -> Item ;
|
|
QKind : Quality -> Kind -> Kind ;
|
|
Wine, Cheese, Fish : Kind ;
|
|
Very : Quality -> Quality ;
|
|
Fresh, Warm, Italian, Expensive, Delicious, Boring : Quality ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
Example <CODE>Phrase</CODE>
|
|
</P>
|
|
<PRE>
|
|
Is (This (QKind Delicious (QKind Italian Wine))) (Very (Very Expensive))
|
|
this delicious Italian wine is very very expensive
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>The concrete syntax FoodEng</H2>
|
|
<PRE>
|
|
concrete FoodEng of Food = {
|
|
|
|
lincat
|
|
Phrase, Item, Kind, Quality = {s : Str} ;
|
|
|
|
lin
|
|
Is item quality = {s = item.s ++ "is" ++ quality.s} ;
|
|
This kind = {s = "this" ++ kind.s} ;
|
|
That kind = {s = "that" ++ kind.s} ;
|
|
QKind quality kind = {s = quality.s ++ kind.s} ;
|
|
Wine = {s = "wine"} ;
|
|
Cheese = {s = "cheese"} ;
|
|
Fish = {s = "fish"} ;
|
|
Very quality = {s = "very" ++ quality.s} ;
|
|
Fresh = {s = "fresh"} ;
|
|
Warm = {s = "warm"} ;
|
|
Italian = {s = "Italian"} ;
|
|
Expensive = {s = "expensive"} ;
|
|
Delicious = {s = "delicious"} ;
|
|
Boring = {s = "boring"} ;
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
Test the grammar for parsing:
|
|
</P>
|
|
<PRE>
|
|
> import FoodEng.gf
|
|
> parse "this delicious wine is very very Italian"
|
|
Is (This (QKind Delicious Wine)) (Very (Very Italian))
|
|
</PRE>
|
|
<P>
|
|
Parse in other categories setting the <CODE>cat</CODE> flag:
|
|
</P>
|
|
<PRE>
|
|
p -cat=Kind "very Italian wine"
|
|
QKind (Very Italian) Wine
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Exercises on the Food grammar</H3>
|
|
<OL>
|
|
<LI>Extend the <CODE>Food</CODE> grammar by ten new food kinds and
|
|
qualities, and run the parser with new kinds of examples.
|
|
<P></P>
|
|
<LI>Add a rule that enables question phrases of the form
|
|
<I>is this cheese Italian</I>.
|
|
<P></P>
|
|
<LI>Enable the optional prefixing of
|
|
phrases with the words "excuse me but". Do this in such a way that
|
|
the prefix can occur at most once.
|
|
</OL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Commands for testing grammars</H2>
|
|
<H3>Generating trees and strings</H3>
|
|
<P>
|
|
Random generation (<CODE>generate_random = gr</CODE>): build
|
|
build a random tree in accordance with an abstract syntax:
|
|
</P>
|
|
<PRE>
|
|
> generate_random
|
|
Is (This (QKind Italian Fish)) Fresh
|
|
</PRE>
|
|
<P>
|
|
By using a pipe, random generation can be fed into linearization:
|
|
</P>
|
|
<PRE>
|
|
> generate_random | linearize
|
|
this Italian fish is fresh
|
|
</PRE>
|
|
<P>
|
|
Use the <CODE>number</CODE> flag to generate several trees:
|
|
</P>
|
|
<PRE>
|
|
> gr -number=4 | l
|
|
that wine is boring
|
|
that fresh cheese is fresh
|
|
that cheese is very boring
|
|
this cheese is Italian
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
To generate <I>all</I> phrases that a grammar can produce,
|
|
use <CODE>generate_trees = gt</CODE>.
|
|
</P>
|
|
<PRE>
|
|
> generate_trees | l
|
|
that cheese is very Italian
|
|
that cheese is very boring
|
|
that cheese is very delicious
|
|
...
|
|
this wine is fresh
|
|
this wine is warm
|
|
</PRE>
|
|
<P>
|
|
The default <B>depth</B> is 3; the depth can be
|
|
set by using the <CODE>depth</CODE> flag:
|
|
</P>
|
|
<PRE>
|
|
> generate_trees -depth=2 | l
|
|
</PRE>
|
|
<P>
|
|
What options a command has can be seen by the <CODE>help = h</CODE> command:
|
|
</P>
|
|
<PRE>
|
|
> help gr
|
|
> help gt
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Exercises on generation</H3>
|
|
<OL>
|
|
<LI>If the command <CODE>gt</CODE> generated all
|
|
trees in your grammar, it would never terminate. Why?
|
|
<P></P>
|
|
<LI>Measure how many trees the grammar gives with depths 4 and 5,
|
|
respectively. <B>Hint</B>. You can
|
|
use the Unix <B>word count</B> command <CODE>wc</CODE> to count lines.
|
|
</OL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>More on pipes: tracing</H3>
|
|
<P>
|
|
Put the <B>tracing</B> option <CODE>-tr</CODE> to each command whose output you
|
|
want to see:
|
|
</P>
|
|
<PRE>
|
|
> gr -tr | l -tr | p
|
|
|
|
Is (This Cheese) Boring
|
|
this cheese is boring
|
|
Is (This Cheese) Boring
|
|
</PRE>
|
|
<P>
|
|
Useful for test purposes: the pipe above can show
|
|
if a grammar is <B>ambiguous</B>, i.e.
|
|
contains strings that can be parsed in more than one way.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Extend the <CODE>Food</CODE> grammar so that it produces ambiguous
|
|
strings, and try out the ambiguity test.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Writing and reading files</H3>
|
|
<P>
|
|
To save the outputs into a file, pipe it to the <CODE>write_file = wf</CODE> command,
|
|
</P>
|
|
<PRE>
|
|
> gr -number=10 | linearize | write_file -file=exx.tmp
|
|
</PRE>
|
|
<P>
|
|
To read a file to GF, use the <CODE>read_file = rf</CODE> command,
|
|
</P>
|
|
<PRE>
|
|
> read_file -file=exx.tmp -lines | parse
|
|
</PRE>
|
|
<P>
|
|
The flag <CODE>-lines</CODE> tells GF to read each line of the file separately.
|
|
</P>
|
|
<P>
|
|
Files with examples can be used for <B>regression testing</B>
|
|
of grammars - the most systematic way to do this is by
|
|
<B>treebanks</B>; see <a href="#sectreebank">here</a>.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Visualizing trees</H3>
|
|
<P>
|
|
Parentheses give a linear representation of trees,
|
|
useful for the computer.
|
|
</P>
|
|
<P>
|
|
Human eye may prefer to see a visualization: <CODE>visualize_tree = vt</CODE>:
|
|
</P>
|
|
<PRE>
|
|
> parse "this delicious cheese is very Italian" | visualize_tree
|
|
</PRE>
|
|
<P>
|
|
The tree is generated in postscript (<CODE>.ps</CODE>) file. The <CODE>-view</CODE> option is used for
|
|
telling what command to use to view the file. Its default is <CODE>"gv"</CODE>, which works
|
|
on most Linux installations. On a Mac, one would probably write
|
|
</P>
|
|
<PRE>
|
|
> parse "this delicious cheese is very Italian" | visualize_tree -view="open"
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<IMG ALIGN="middle" SRC="mytree.png" BORDER="0" ALT="">
|
|
</P>
|
|
<P>
|
|
This command uses the program <A HREF="http://www.graphviz.org/">Graphviz</A>, which you
|
|
might not have, but which are freely available on the web.
|
|
</P>
|
|
<P>
|
|
You can save the temporary file <CODE>_grph.dot</CODE>,
|
|
which the command <CODE>vt</CODE> produces.
|
|
</P>
|
|
<P>
|
|
Then you can process this file with the <CODE>dot</CODE>
|
|
program (from the Graphviz package).
|
|
</P>
|
|
<PRE>
|
|
% dot -Tpng _grph.dot > mytree.png
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>System commands</H3>
|
|
<P>
|
|
You can give a <B>system command</B> without leaving GF:
|
|
<CODE>!</CODE> followed by a Unix command,
|
|
</P>
|
|
<PRE>
|
|
> ! dot -Tpng grphtmp.dot > mytree.png
|
|
> ! open mytree.png
|
|
</PRE>
|
|
<P>
|
|
A system command may also receive its argument from
|
|
a GF pipes. It then has the name <CODE>sp</CODE> = <CODE>system_pipe</CODE>:
|
|
</P>
|
|
<PRE>
|
|
> generate_trees -depth=4 | sp -command="wc -l"
|
|
</PRE>
|
|
<P>
|
|
This command example returns the number of generated trees.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>.
|
|
Measure how many trees the grammar <CODE>FoodEng</CODE> gives with depths 4 and 5,
|
|
respectively. Use the Unix <B>word count</B> command <CODE>wc</CODE> to count lines, and
|
|
a system pipe from a GF command into a Unix command.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>An Italian concrete syntax</H2>
|
|
<P>
|
|
<a name="secanitalian"></a>
|
|
</P>
|
|
<P>
|
|
Just (?) replace English words with their dictionary equivalents:
|
|
</P>
|
|
<PRE>
|
|
concrete FoodIta of Food = {
|
|
|
|
lincat
|
|
Phrase, Item, Kind, Quality = {s : Str} ;
|
|
|
|
lin
|
|
Is item quality = {s = item.s ++ "è" ++ quality.s} ;
|
|
This kind = {s = "questo" ++ kind.s} ;
|
|
That kind = {s = "quel" ++ kind.s} ;
|
|
QKind quality kind = {s = kind.s ++ quality.s} ;
|
|
Wine = {s = "vino"} ;
|
|
Cheese = {s = "formaggio"} ;
|
|
Fish = {s = "pesce"} ;
|
|
Very quality = {s = "molto" ++ quality.s} ;
|
|
Fresh = {s = "fresco"} ;
|
|
Warm = {s = "caldo"} ;
|
|
Italian = {s = "italiano"} ;
|
|
Expensive = {s = "caro"} ;
|
|
Delicious = {s = "delizioso"} ;
|
|
Boring = {s = "noioso"} ;
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
Not just replacing words:
|
|
</P>
|
|
<P>
|
|
The order of a quality and the kind it modifies is changed in
|
|
</P>
|
|
<PRE>
|
|
QKind quality kind = {s = kind.s ++ quality.s} ;
|
|
</PRE>
|
|
<P>
|
|
Thus Italian says <CODE>vino italiano</CODE> for <CODE>Italian wine</CODE>.
|
|
</P>
|
|
<P>
|
|
(Some Italian adjectives
|
|
are put before the noun. This distinction can be controlled by parameters,
|
|
which are introduced in <a href="#chaptwo">Lesson 3</a>.)
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Exercises on multilinguality</H3>
|
|
<OL>
|
|
<LI>Write a concrete syntax of <CODE>Food</CODE> for some other language.
|
|
You will probably end up with grammatically incorrect
|
|
linearizations - but don't
|
|
worry about this yet.
|
|
<P></P>
|
|
<LI>If you have written <CODE>Food</CODE> for German, Swedish, or some
|
|
other language, test with random or exhaustive generation what constructs
|
|
come out incorrect, and prepare a list of those ones that cannot be helped
|
|
with the currently available fragment of GF. You can return to your list
|
|
after having worked out <a href="#chaptwo">Lesson 3</a>.
|
|
</OL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Free variation</H2>
|
|
<P>
|
|
Semantically indistinguishable ways of expressing a thing.
|
|
</P>
|
|
<P>
|
|
The <B>variants</B> construct of GF expresses free variation. For example,
|
|
</P>
|
|
<PRE>
|
|
lin Delicious = {s = "delicious" | "exquisit" | "tasty"} ;
|
|
</PRE>
|
|
<P>
|
|
By default, the <CODE>linearize</CODE> command
|
|
shows only the first variant from such lists; to see them
|
|
all, use the option <CODE>-all</CODE>:
|
|
</P>
|
|
<PRE>
|
|
> p "this exquisit wine is delicious" | l -all
|
|
this delicious wine is delicious
|
|
this delicious wine is exquisit
|
|
...
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
An equivalent notation for variants is
|
|
</P>
|
|
<PRE>
|
|
lin Delicious = {s = variants {"delicious" ; "exquisit" ; "tasty"}} ;
|
|
</PRE>
|
|
<P>
|
|
This notation also allows the limiting case: an empty variant list,
|
|
</P>
|
|
<PRE>
|
|
variants {}
|
|
</PRE>
|
|
<P>
|
|
It can be used e.g. if a word lacks a certain inflection form.
|
|
</P>
|
|
<P>
|
|
Free variation works for all types in concrete syntax; all terms in
|
|
a variant list must be of the same type.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>More application of multilingual grammars</H2>
|
|
<H3>Multilingual treebanks</H3>
|
|
<P>
|
|
<a name="sectreebank"></a>
|
|
</P>
|
|
<P>
|
|
<B>Multilingual treebank</B>: a set of trees with their
|
|
linearizations in different languages:
|
|
</P>
|
|
<PRE>
|
|
> gr -number=2 | l -treebank
|
|
|
|
Is (That Cheese) (Very Boring)
|
|
quel formaggio è molto noioso
|
|
that cheese is very boring
|
|
|
|
Is (That Cheese) Fresh
|
|
quel formaggio è fresco
|
|
that cheese is fresh
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Translation quiz</H3>
|
|
<P>
|
|
<CODE>translation_quiz = tq</CODE>:
|
|
generate random sentences, display them in one language, and check the user's
|
|
answer given in another language.
|
|
</P>
|
|
<PRE>
|
|
> translation_quiz -from=FoodEng -to=FoodIta
|
|
|
|
Welcome to GF Translation Quiz.
|
|
The quiz is over when you have done at least 10 examples
|
|
with at least 75 % success.
|
|
You can interrupt the quiz by entering a line consisting of a dot ('.').
|
|
|
|
this fish is warm
|
|
questo pesce è caldo
|
|
> Yes.
|
|
Score 1/1
|
|
|
|
this cheese is Italian
|
|
questo formaggio è noioso
|
|
> No, not questo formaggio è noioso, but
|
|
questo formaggio è italiano
|
|
|
|
Score 1/2
|
|
this fish is expensive
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Context-free grammars and GF</H2>
|
|
<H3>The "cf" grammar format</H3>
|
|
<P>
|
|
The grammar <CODE>FoodEng</CODE> can be written in a BNF format as follows:
|
|
</P>
|
|
<PRE>
|
|
Is. Phrase ::= Item "is" Quality ;
|
|
That. Item ::= "that" Kind ;
|
|
This. Item ::= "this" Kind ;
|
|
QKind. Kind ::= Quality Kind ;
|
|
Cheese. Kind ::= "cheese" ;
|
|
Fish. Kind ::= "fish" ;
|
|
Wine. Kind ::= "wine" ;
|
|
Italian. Quality ::= "Italian" ;
|
|
Boring. Quality ::= "boring" ;
|
|
Delicious. Quality ::= "delicious" ;
|
|
Expensive. Quality ::= "expensive" ;
|
|
Fresh. Quality ::= "fresh" ;
|
|
Very. Quality ::= "very" Quality ;
|
|
Warm. Quality ::= "warm" ;
|
|
</PRE>
|
|
<P>
|
|
GF can convert BNF grammars into GF.
|
|
BNF files are recognized by the file name suffix <CODE>.cf</CODE> (for <B>context-free</B>):
|
|
</P>
|
|
<PRE>
|
|
> import food.cf
|
|
</PRE>
|
|
<P>
|
|
The compiler creates separate abstract and concrete modules internally.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Restrictions of context-free grammars</H3>
|
|
<P>
|
|
Separating concrete and abstract syntax allows
|
|
three deviations from context-free grammar:
|
|
</P>
|
|
<UL>
|
|
<LI><B>permutation</B>: changing the order of constituents
|
|
<LI><B>suppression</B>: omitting constituents
|
|
<LI><B>reduplication</B>: repeating constituents
|
|
</UL>
|
|
|
|
<P>
|
|
<B>Exercise</B>. Define the non-context-free
|
|
copy language <CODE>{x x | x <- (a|b)*}</CODE> in GF.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Modules and files</H2>
|
|
<P>
|
|
GF uses suffixes to recognize different file formats:
|
|
</P>
|
|
<UL>
|
|
<LI>Source files: <I>Modulename</I><CODE>.gf</CODE>
|
|
<LI>Target files: <I>Modulename</I><CODE>.gfo</CODE>
|
|
</UL>
|
|
|
|
<P>
|
|
Importing generates target from source:
|
|
</P>
|
|
<PRE>
|
|
> i FoodEng.gf
|
|
- compiling Food.gf... wrote file Food.gfo 16 msec
|
|
- compiling FoodEng.gf... wrote file FoodEng.gfo 20 msec
|
|
</PRE>
|
|
<P>
|
|
The <CODE>.gfo</CODE> format (="GF Object") is precompiled GF, which is
|
|
faster to load than source GF (<CODE>.gf</CODE>).
|
|
</P>
|
|
<P>
|
|
When reading a module, GF decides whether
|
|
to use an existing <CODE>.gfo</CODE> file or to generate
|
|
a new one, by looking at modification times.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. What happens when you import <CODE>FoodEng.gf</CODE> for
|
|
a second time? Try this in different situations:
|
|
</P>
|
|
<UL>
|
|
<LI>Right after importing it the first time (the modules are kept in
|
|
the memory of GF and need no reloading).
|
|
<LI>After issuing the command <CODE>empty</CODE> (<CODE>e</CODE>), which clears the memory
|
|
of GF.
|
|
<LI>After making a small change in <CODE>FoodEng.gf</CODE>, be it only an added space.
|
|
<LI>After making a change in <CODE>Food.gf</CODE>.
|
|
</UL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Using operations and resource modules</H2>
|
|
<H3>Operation definitions</H3>
|
|
<P>
|
|
The golden rule of functional programmin:
|
|
</P>
|
|
<P>
|
|
<I>Whenever you find yourself programming by copy-and-paste, write a function instead.</I>
|
|
</P>
|
|
<P>
|
|
Functions in concrete syntax are defined using the keyword <CODE>oper</CODE> (for
|
|
<B>operation</B>), distinct from <CODE>fun</CODE> for the sake of clarity.
|
|
</P>
|
|
<P>
|
|
Example:
|
|
</P>
|
|
<PRE>
|
|
oper ss : Str -> {s : Str} = \x -> {s = x} ;
|
|
</PRE>
|
|
<P>
|
|
The operation can be <B>applied</B> to an argument, and GF will
|
|
<B>compute</B> the value:
|
|
</P>
|
|
<PRE>
|
|
ss "boy" ===> {s = "boy"}
|
|
</PRE>
|
|
<P>
|
|
The symbol <CODE>===></CODE> will be used for computation.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
Notice the <B>lambda abstraction</B> form
|
|
</P>
|
|
<UL>
|
|
<LI><CODE>\</CODE><I>x</I> <CODE>-></CODE> <I>t</I>
|
|
</UL>
|
|
|
|
<P>
|
|
This is read:
|
|
</P>
|
|
<UL>
|
|
<LI>function with variable <I>x</I> and <B>function body</B> <I>t</I>
|
|
</UL>
|
|
|
|
<P>
|
|
For lambda abstraction with multiple arguments, we have the shorthand
|
|
</P>
|
|
<PRE>
|
|
\x,y -> t === \x -> \y -> t
|
|
</PRE>
|
|
<P>
|
|
Linearization rules actually use syntactic
|
|
sugar for abstraction:
|
|
</P>
|
|
<PRE>
|
|
lin f x = t === lin f = \x -> t
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>The ``resource`` module type</H3>
|
|
<P>
|
|
The <CODE>resource</CODE> module type is used to package
|
|
<CODE>oper</CODE> definitions into reusable resources.
|
|
</P>
|
|
<PRE>
|
|
resource StringOper = {
|
|
oper
|
|
SS : Type = {s : Str} ;
|
|
ss : Str -> SS = \x -> {s = x} ;
|
|
cc : SS -> SS -> SS = \x,y -> ss (x.s ++ y.s) ;
|
|
prefix : Str -> SS -> SS = \p,x -> ss (p ++ x.s) ;
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Opening a resource</H3>
|
|
<P>
|
|
Any number of <CODE>resource</CODE> modules can be
|
|
<B>open</B>ed in a <CODE>concrete</CODE> syntax.
|
|
</P>
|
|
<PRE>
|
|
concrete FoodEng of Food = open StringOper in {
|
|
|
|
lincat
|
|
S, Item, Kind, Quality = SS ;
|
|
|
|
lin
|
|
Is item quality = cc item (prefix "is" quality) ;
|
|
This k = prefix "this" k ;
|
|
That k = prefix "that" k ;
|
|
QKind k q = cc k q ;
|
|
Wine = ss "wine" ;
|
|
Cheese = ss "cheese" ;
|
|
Fish = ss "fish" ;
|
|
Very = prefix "very" ;
|
|
Fresh = ss "fresh" ;
|
|
Warm = ss "warm" ;
|
|
Italian = ss "Italian" ;
|
|
Expensive = ss "expensive" ;
|
|
Delicious = ss "delicious" ;
|
|
Boring = ss "boring" ;
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Partial application</H3>
|
|
<P>
|
|
<a name="secpartapp"></a>
|
|
</P>
|
|
<P>
|
|
The rule
|
|
</P>
|
|
<PRE>
|
|
lin This k = prefix "this" k ;
|
|
</PRE>
|
|
<P>
|
|
can be written more concisely
|
|
</P>
|
|
<PRE>
|
|
lin This = prefix "this" ;
|
|
</PRE>
|
|
<P>
|
|
Part of the art in functional programming:
|
|
decide the order of arguments in a function,
|
|
so that partial application can be used as much as possible.
|
|
</P>
|
|
<P>
|
|
For instance, <CODE>prefix</CODE> is typically applied to
|
|
linearization variables with constant strings. Hence we
|
|
put the <CODE>Str</CODE> argument before the <CODE>SS</CODE> argument.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Define an operation <CODE>infix</CODE> analogous to <CODE>prefix</CODE>,
|
|
such that it allows you to write
|
|
</P>
|
|
<PRE>
|
|
lin Is = infix "is" ;
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Testing resource modules</H3>
|
|
<P>
|
|
Import with the flag <CODE>-retain</CODE>,
|
|
</P>
|
|
<PRE>
|
|
> import -retain StringOper.gf
|
|
</PRE>
|
|
<P>
|
|
Compute the value with <CODE>compute_concrete = cc</CODE>,
|
|
</P>
|
|
<PRE>
|
|
> compute_concrete prefix "in" (ss "addition")
|
|
{s : Str = "in" ++ "addition"}
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Grammar architecture</H2>
|
|
<P>
|
|
<a name="secarchitecture"></a>
|
|
</P>
|
|
<H3>Extending a grammar</H3>
|
|
<P>
|
|
A new module can <B>extend</B> an old one:
|
|
</P>
|
|
<PRE>
|
|
abstract Morefood = Food ** {
|
|
cat
|
|
Question ;
|
|
fun
|
|
QIs : Item -> Quality -> Question ;
|
|
Pizza : Kind ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
Parallel to the abstract syntax, extensions can
|
|
be built for concrete syntaxes:
|
|
</P>
|
|
<PRE>
|
|
concrete MorefoodEng of Morefood = FoodEng ** {
|
|
lincat
|
|
Question = {s : Str} ;
|
|
lin
|
|
QIs item quality = {s = "is" ++ item.s ++ quality.s} ;
|
|
Pizza = {s = "pizza"} ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
The effect of extension: all of the contents of the extended
|
|
and extending modules are put together.
|
|
</P>
|
|
<P>
|
|
In other words: the new module <B>inherits</B> the contents of the old module.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
Simultaneous extension and opening:
|
|
</P>
|
|
<PRE>
|
|
concrete MorefoodIta of Morefood = FoodIta ** open StringOper in {
|
|
lincat
|
|
Question = SS ;
|
|
lin
|
|
QIs item quality = ss (item.s ++ "è" ++ quality.s) ;
|
|
Pizza = ss "pizza" ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
Resource modules can extend other resource modules - thus it is
|
|
possible to build resource hierarchies.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Multiple inheritance</H3>
|
|
<P>
|
|
Extend several grammars at the same time:
|
|
</P>
|
|
<PRE>
|
|
abstract Foodmarket = Food, Fruit, Mushroom ** {
|
|
fun
|
|
FruitKind : Fruit -> Kind ;
|
|
MushroomKind : Mushroom -> Kind ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
where
|
|
</P>
|
|
<PRE>
|
|
abstract Fruit = {
|
|
cat Fruit ;
|
|
fun Apple, Peach : Fruit ;
|
|
}
|
|
|
|
abstract Mushroom = {
|
|
cat Mushroom ;
|
|
fun Cep, Agaric : Mushroom ;
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<B>Exercise</B>. Refactor <CODE>Food</CODE> by taking apart <CODE>Wine</CODE> into a special
|
|
<CODE>Drink</CODE> module.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H1>Lesson 3: Grammars with parameters</H1>
|
|
<P>
|
|
<a name="chapfour"></a>
|
|
</P>
|
|
<P>
|
|
Goals:
|
|
</P>
|
|
<UL>
|
|
<LI>implement sophisticated linguistic structures:
|
|
<UL>
|
|
<LI>morphology: the inflection of words
|
|
<LI>agreement: rules for selecting word forms in syntactic combinations
|
|
</UL>
|
|
</UL>
|
|
|
|
<UL>
|
|
<LI>Cover all GF constructs for concrete syntax
|
|
</UL>
|
|
|
|
<P>
|
|
It is possible to skip this chapter and go directly
|
|
to the next, since the use of the GF Resource Grammar library
|
|
makes it unnecessary to use parameters: they
|
|
could be left to library implementors.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>The problem: words have to be inflected</H2>
|
|
<P>
|
|
Plural forms are needed in things like
|
|
<center>
|
|
<I>these Italian wines are delicious</I>
|
|
</center>
|
|
This requires two things:
|
|
</P>
|
|
<UL>
|
|
<LI>the <B>inflection</B> of nouns and verbs in singular and plural
|
|
<LI>the <B>agreement</B> of the verb to subject:
|
|
the verb must have the same number as the subject
|
|
</UL>
|
|
|
|
<P>
|
|
Different languages have different types of inflection and agreement.
|
|
</P>
|
|
<UL>
|
|
<LI>Italian has also gender (masculine vs. feminine).
|
|
</UL>
|
|
|
|
<P>
|
|
In a multilingual grammar,
|
|
we want to ignore such distinctions in abstract syntax.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Make a list of the possible forms that nouns,
|
|
adjectives, and verbs can have in some languages that you know.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Parameters and tables</H2>
|
|
<P>
|
|
We define the <B>parameter type</B> of number in English by
|
|
a new form of judgement:
|
|
</P>
|
|
<PRE>
|
|
param Number = Sg | Pl ;
|
|
</PRE>
|
|
<P>
|
|
This judgement defines the parameter type <CODE>Number</CODE> by listing
|
|
its two <B>constructors</B>, <CODE>Sg</CODE> and <CODE>Pl</CODE>
|
|
(singular and plural).
|
|
</P>
|
|
<P>
|
|
We give <CODE>Kind</CODE> a linearization type that has a <B>table</B> depending on number:
|
|
</P>
|
|
<PRE>
|
|
lincat Kind = {s : Number => Str} ;
|
|
</PRE>
|
|
<P>
|
|
The <B>table type</B> <CODE>Number => Str</CODE> is similar a function type
|
|
(<CODE>Number -> Str</CODE>).
|
|
</P>
|
|
<P>
|
|
Difference: the argument must be a parameter type. Then
|
|
the argument-value pairs can be listed in a finite table.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
Here is a table:
|
|
</P>
|
|
<PRE>
|
|
lin Cheese = {
|
|
s = table {
|
|
Sg => "cheese" ;
|
|
Pl => "cheeses"
|
|
}
|
|
} ;
|
|
</PRE>
|
|
<P>
|
|
The table has <B>branches</B>, with a <B>pattern</B> on the
|
|
left of the arrow <CODE>=></CODE> and a <B>value</B> on the right.
|
|
</P>
|
|
<P>
|
|
The application of a table is done by the <B>selection</B> operator <CODE>!</CODE>.
|
|
</P>
|
|
<P>
|
|
It which is computed by <B>pattern matching</B>: return
|
|
the value from the first branch whose pattern matches the
|
|
argument. For instance,
|
|
</P>
|
|
<PRE>
|
|
table {Sg => "cheese" ; Pl => "cheeses"} ! Pl
|
|
===> "cheeses"
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
<B>Case expressions</B> are syntactic sugar:
|
|
</P>
|
|
<PRE>
|
|
case e of {...} === table {...} ! e
|
|
</PRE>
|
|
<P>
|
|
Since they are familiar to Haskell and ML programmers, they can come out handy
|
|
when writing GF programs.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
Constructors can take arguments from other parameter types.
|
|
</P>
|
|
<P>
|
|
Example: forms of English verbs (except <I>be</I>):
|
|
</P>
|
|
<PRE>
|
|
param VerbForm = VPresent Number | VPast | VPastPart | VPresPart ;
|
|
</PRE>
|
|
<P>
|
|
Fact expressed: only present tense has number variation.
|
|
</P>
|
|
<P>
|
|
Example table: the forms of the verb <I>drink</I>:
|
|
</P>
|
|
<PRE>
|
|
table {
|
|
VPresent Sg => "drinks" ;
|
|
VPresent Pl => "drink" ;
|
|
VPast => "drank" ;
|
|
VPastPart => "drunk" ;
|
|
VPresPart => "drinking"
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<B>Exercise</B>. In an earlier exercise (previous section),
|
|
you made a list of the possible
|
|
forms that nouns, adjectives, and verbs can have in some languages that
|
|
you know. Now take some of the results and implement them by
|
|
using parameter type definitions and tables. Write them into a <CODE>resource</CODE>
|
|
module, which you can test by using the command <CODE>compute_concrete</CODE>.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Inflection tables and paradigms</H2>
|
|
<P>
|
|
A morphological <B>paradigm</B> is a formula telling how a class of
|
|
words is inflected.
|
|
</P>
|
|
<P>
|
|
From the GF point of view, a paradigm is a function that takes
|
|
a <B>lemma</B> (also known as a <B>dictionary form</B>, or a <B>citation form</B>) and
|
|
returns an inflection table.
|
|
</P>
|
|
<P>
|
|
The following operation defines the regular noun paradigm of English:
|
|
</P>
|
|
<PRE>
|
|
oper regNoun : Str -> {s : Number => Str} = \dog -> {
|
|
s = table {
|
|
Sg => dog ;
|
|
Pl => dog + "s"
|
|
}
|
|
} ;
|
|
</PRE>
|
|
<P>
|
|
The <B>gluing</B> operator <CODE>+</CODE> glues strings to one <B>token</B>:
|
|
</P>
|
|
<PRE>
|
|
(regNoun "cheese").s ! Pl ===> "cheese" + "s" ===> "cheeses"
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
A more complex example: regular verbs,
|
|
</P>
|
|
<PRE>
|
|
oper regVerb : Str -> {s : VerbForm => Str} = \talk -> {
|
|
s = table {
|
|
VPresent Sg => talk + "s" ;
|
|
VPresent Pl => talk ;
|
|
VPresPart => talk + "ing" ;
|
|
_ => talk + "ed"
|
|
}
|
|
} ;
|
|
</PRE>
|
|
<P>
|
|
The catch-all case for the past tense and the past participle
|
|
uses a <B>wild card</B> pattern <CODE>_</CODE>.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Exercises on morphology</H3>
|
|
<OL>
|
|
<LI>Identify cases in which the <CODE>regNoun</CODE> paradigm does not
|
|
apply in English, and implement some alternative paradigms.
|
|
<P></P>
|
|
<LI>Implement some regular paradigms for other languages you have
|
|
considered in earlier exercises.
|
|
</OL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Using parameters in concrete syntax</H2>
|
|
<P>
|
|
Purpose: a more radical
|
|
variation between languages
|
|
than just the use of different words and word orders.
|
|
</P>
|
|
<P>
|
|
We add to the grammar <CODE>Food</CODE> two rules for forming plural items:
|
|
</P>
|
|
<PRE>
|
|
fun These, Those : Kind -> Item ;
|
|
</PRE>
|
|
<P>
|
|
We also add a noun which in Italian has the feminine case:
|
|
</P>
|
|
<PRE>
|
|
fun Pizza : Kind ;
|
|
</PRE>
|
|
<P>
|
|
This will force us to deal with gender-
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Agreement</H3>
|
|
<P>
|
|
In English, the phrase-forming rule
|
|
</P>
|
|
<PRE>
|
|
fun Is : Item -> Quality -> Phrase ;
|
|
</PRE>
|
|
<P>
|
|
is affected by the number because of <B>subject-verb agreement</B>:
|
|
the verb of a sentence must be inflected in the number of the subject,
|
|
</P>
|
|
<PRE>
|
|
Is (This Pizza) Warm ===> "this pizza is warm"
|
|
Is (These Pizza) Warm ===> "these pizzas are warm"
|
|
</PRE>
|
|
<P>
|
|
It is the <B>copula</B> (the verb <I>be</I>) that is affected:
|
|
</P>
|
|
<PRE>
|
|
oper copula : Number -> Str = \n ->
|
|
case n of {
|
|
Sg => "is" ;
|
|
Pl => "are"
|
|
} ;
|
|
</PRE>
|
|
<P>
|
|
The <B>subject</B> <CODE>Item</CODE> must have such a number to provide to the copula:
|
|
</P>
|
|
<PRE>
|
|
lincat Item = {s : Str ; n : Number} ;
|
|
</PRE>
|
|
<P>
|
|
Now we can write
|
|
</P>
|
|
<PRE>
|
|
lin Is item qual = {s = item.s ++ copula item.n ++ qual.s} ;
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Determiners</H3>
|
|
<P>
|
|
How does an <CODE>Item</CODE> subject receive its number? The rules
|
|
</P>
|
|
<PRE>
|
|
fun This, These : Kind -> Item ;
|
|
</PRE>
|
|
<P>
|
|
add <B>determiners</B>, either <I>this</I> or <I>these</I>, which
|
|
require different <I>this pizza</I> vs.
|
|
<I>these pizzas</I>.
|
|
</P>
|
|
<P>
|
|
Thus <CODE>Kind</CODE> must have both singular and plural forms:
|
|
</P>
|
|
<PRE>
|
|
lincat Kind = {s : Number => Str} ;
|
|
</PRE>
|
|
<P>
|
|
We can write
|
|
</P>
|
|
<PRE>
|
|
lin This kind = {
|
|
s = "this" ++ kind.s ! Sg ;
|
|
n = Sg
|
|
} ;
|
|
|
|
lin These kind = {
|
|
s = "these" ++ kind.s ! Pl ;
|
|
n = Pl
|
|
} ;
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
To avoid copy-and-paste, we can factor out the pattern of determination,
|
|
</P>
|
|
<PRE>
|
|
oper det :
|
|
Str -> Number -> {s : Number => Str} -> {s : Str ; n : Number} =
|
|
\det,n,kind -> {
|
|
s = det ++ kind.s ! n ;
|
|
n = n
|
|
} ;
|
|
</PRE>
|
|
<P>
|
|
Now we can write
|
|
</P>
|
|
<PRE>
|
|
lin This = det Sg "this" ;
|
|
lin These = det Pl "these" ;
|
|
</PRE>
|
|
<P>
|
|
In a more <B>lexicalized</B> grammar, determiners would be a category:
|
|
</P>
|
|
<PRE>
|
|
lincat Det = {s : Str ; n : Number} ;
|
|
fun Det : Det -> Kind -> Item ;
|
|
lin Det det kind = {
|
|
s = det.s ++ kind.s ! det.n ;
|
|
n = det.n
|
|
} ;
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Parametric vs. inherent features</H3>
|
|
<P>
|
|
<CODE>Kind</CODE>s have number as a <B>parametric feature</B>: both singular and plural
|
|
can be formed,
|
|
</P>
|
|
<PRE>
|
|
lincat Kind = {s : Number => Str} ;
|
|
</PRE>
|
|
<P>
|
|
<CODE>Item</CODE>s have number as an <B>inherent feature</B>: they are inherently either
|
|
singular or plural,
|
|
</P>
|
|
<PRE>
|
|
lincat Item = {s : Str ; n : Number} ;
|
|
</PRE>
|
|
<P>
|
|
Italian <CODE>Kind</CODE> will have parametric number and inherent gender:
|
|
</P>
|
|
<PRE>
|
|
lincat Kind = {s : Number => Str ; g : Gender} ;
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
Questions to ask when designing parameters:
|
|
</P>
|
|
<UL>
|
|
<LI>existence: what forms are possible to build by morphological and
|
|
other means?
|
|
<LI>need: what features are expected via agreement or government?
|
|
</UL>
|
|
|
|
<P>
|
|
Dictionaries give good advice:
|
|
<center>
|
|
<B>uomo</B>, pl. <I>uomini</I>, n.m. "man"
|
|
</center>
|
|
tells that <I>uomo</I> is a masculine noun with the plural form <I>uomini</I>.
|
|
Hence, parametric number and an inherent gender.
|
|
</P>
|
|
<P>
|
|
For words, inherent features are usually given as lexical information.
|
|
</P>
|
|
<P>
|
|
For combinations, they are <I>inherited</I> from some part of the construction
|
|
(typically the one called the <B>head</B>). Italian modification:
|
|
</P>
|
|
<PRE>
|
|
lin QKind qual kind =
|
|
let gen = kind.g in {
|
|
s = table {n => kind.s ! n ++ qual.s ! gen ! n} ;
|
|
g = gen
|
|
} ;
|
|
</PRE>
|
|
<P>
|
|
Notice
|
|
</P>
|
|
<UL>
|
|
<LI><B>local definition</B> (<CODE>let</CODE> expression)
|
|
<LI><B>variable pattern</B> <CODE>n</CODE>
|
|
</UL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>An English concrete syntax for Foods with parameters</H2>
|
|
<P>
|
|
We use some string operations from the library <CODE>Prelude</CODE> are used.
|
|
</P>
|
|
<PRE>
|
|
concrete FoodsEng of Foods = open Prelude in {
|
|
|
|
lincat
|
|
S, Quality = SS ;
|
|
Kind = {s : Number => Str} ;
|
|
Item = {s : Str ; n : Number} ;
|
|
|
|
lin
|
|
Is item quality = ss (item.s ++ copula item.n ++ quality.s) ;
|
|
This = det Sg "this" ;
|
|
That = det Sg "that" ;
|
|
These = det Pl "these" ;
|
|
Those = det Pl "those" ;
|
|
QKind quality kind = {s = table {n => quality.s ++ kind.s ! n}} ;
|
|
Wine = regNoun "wine" ;
|
|
Cheese = regNoun "cheese" ;
|
|
Fish = noun "fish" "fish" ;
|
|
Pizza = regNoun "pizza" ;
|
|
Very = prefixSS "very" ;
|
|
Fresh = ss "fresh" ;
|
|
Warm = ss "warm" ;
|
|
Italian = ss "Italian" ;
|
|
Expensive = ss "expensive" ;
|
|
Delicious = ss "delicious" ;
|
|
Boring = ss "boring" ;
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<PRE>
|
|
param
|
|
Number = Sg | Pl ;
|
|
|
|
oper
|
|
det : Number -> Str -> {s : Number => Str} -> {s : Str ; n : Number} =
|
|
\n,d,cn -> {
|
|
s = d ++ cn.s ! n ;
|
|
n = n
|
|
} ;
|
|
noun : Str -> Str -> {s : Number => Str} =
|
|
\man,men -> {s = table {
|
|
Sg => man ;
|
|
Pl => men
|
|
}
|
|
} ;
|
|
regNoun : Str -> {s : Number => Str} =
|
|
\car -> noun car (car + "s") ;
|
|
copula : Number -> Str =
|
|
\n -> case n of {
|
|
Sg => "is" ;
|
|
Pl => "are"
|
|
} ;
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>More on inflection paradigms</H2>
|
|
<P>
|
|
<a name="secinflection"></a>
|
|
</P>
|
|
<P>
|
|
Let us extend the English noun paradigms so that we can
|
|
deal with all nouns, not just the regular ones. The goal is to
|
|
provide a morphology module that makes it easy to
|
|
add words to a lexicon.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Worst-case functions</H3>
|
|
<P>
|
|
We perform <B>data abstraction</B> from the type
|
|
of nouns by writing a a <B>worst-case function</B>:
|
|
</P>
|
|
<PRE>
|
|
oper Noun : Type = {s : Number => Str} ;
|
|
|
|
oper mkNoun : Str -> Str -> Noun = \x,y -> {
|
|
s = table {
|
|
Sg => x ;
|
|
Pl => y
|
|
}
|
|
} ;
|
|
|
|
oper regNoun : Str -> Noun = \x -> mkNoun x (x + "s") ;
|
|
</PRE>
|
|
<P>
|
|
Then we can define
|
|
</P>
|
|
<PRE>
|
|
lincat N = Noun ;
|
|
lin Mouse = mkNoun "mouse" "mice" ;
|
|
lin House = regNoun "house" ;
|
|
</PRE>
|
|
<P>
|
|
where the underlying types are not seen.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
We are free to change the undelying definitions, e.g.
|
|
add <B>case</B> (nominative or genitive) to noun inflection:
|
|
</P>
|
|
<PRE>
|
|
param Case = Nom | Gen ;
|
|
|
|
oper Noun : Type = {s : Number => Case => Str} ;
|
|
</PRE>
|
|
<P>
|
|
Now we have to redefine the worst-case function
|
|
</P>
|
|
<PRE>
|
|
oper mkNoun : Str -> Str -> Noun = \x,y -> {
|
|
s = table {
|
|
Sg => table {
|
|
Nom => x ;
|
|
Gen => x + "'s"
|
|
} ;
|
|
Pl => table {
|
|
Nom => y ;
|
|
Gen => y + case last y of {
|
|
"s" => "'" ;
|
|
_ => "'s"
|
|
}
|
|
}
|
|
} ;
|
|
</PRE>
|
|
<P>
|
|
But up from this level, we can retain the old definitions
|
|
</P>
|
|
<PRE>
|
|
lin Mouse = mkNoun "mouse" "mice" ;
|
|
oper regNoun : Str -> Noun = \x -> mkNoun x (x + "s") ;
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
In the last definition of <CODE>mkNoun</CODE>, we used a case expression
|
|
on the last character of the plural, as well as the <CODE>Prelude</CODE>
|
|
operation
|
|
</P>
|
|
<PRE>
|
|
last : Str -> Str ;
|
|
</PRE>
|
|
<P>
|
|
returning the string consisting of the last character.
|
|
</P>
|
|
<P>
|
|
The case expression uses <B>pattern matching over strings</B>, which
|
|
is supported in GF, alongside with pattern matching over
|
|
parameters.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Smart paradigms</H3>
|
|
<P>
|
|
The regular <I>dog</I>-<I>dogs</I> paradigm has
|
|
predictable variations:
|
|
</P>
|
|
<UL>
|
|
<LI>nouns ending with an <I>y</I>: <I>fly</I>-<I>flies</I>, except if
|
|
a vowel precedes the <I>y</I>: <I>boy</I>-<I>boys</I>
|
|
<LI>nouns ending with <I>s</I>, <I>ch</I>, and a number of
|
|
other endings: <I>bus</I>-<I>buses</I>, <I>leech</I>-<I>leeches</I>
|
|
</UL>
|
|
|
|
<P>
|
|
We could provide alternative paradigms:
|
|
</P>
|
|
<PRE>
|
|
noun_y : Str -> Noun = \fly -> mkNoun fly (init fly + "ies") ;
|
|
noun_s : Str -> Noun = \bus -> mkNoun bus (bus + "es") ;
|
|
</PRE>
|
|
<P>
|
|
(The Prelude function <CODE>init</CODE> drops the last character of a token.)
|
|
</P>
|
|
<P>
|
|
Drawbacks:
|
|
</P>
|
|
<UL>
|
|
<LI>it can be difficult to select the correct paradigm
|
|
<LI>it can be difficult to remember the names of the different paradigms
|
|
</UL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
Better solution: a <B>smart paradigm</B>:
|
|
</P>
|
|
<PRE>
|
|
regNoun : Str -> Noun = \w ->
|
|
let
|
|
ws : Str = case w of {
|
|
_ + ("a" | "e" | "i" | "o") + "o" => w + "s" ; -- bamboo
|
|
_ + ("s" | "x" | "sh" | "o") => w + "es" ; -- bus, hero
|
|
_ + "z" => w + "zes" ;-- quiz
|
|
_ + ("a" | "e" | "o" | "u") + "y" => w + "s" ; -- boy
|
|
x + "y" => x + "ies" ;-- fly
|
|
_ => w + "s" -- car
|
|
}
|
|
in
|
|
mkNoun w ws
|
|
</PRE>
|
|
<P>
|
|
GF has <B>regular expression patterns</B>:
|
|
</P>
|
|
<UL>
|
|
<LI><B>disjunctive patterns</B> <I>P</I> <CODE>|</CODE> <I>Q</I>
|
|
<LI><B>concatenation patterns</B> <I>P</I> <CODE>+</CODE> <I>Q</I>
|
|
</UL>
|
|
|
|
<P>
|
|
The patterns are ordered in such a way that, for instance,
|
|
the suffix <CODE>"oo"</CODE> prevents <I>bamboo</I> from matching the suffix
|
|
<CODE>"o"</CODE>.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Exercises on regular patterns</H3>
|
|
<OL>
|
|
<LI>The same rules that form plural nouns in English also
|
|
apply in the formation of third-person singular verbs.
|
|
Write a regular verb paradigm that uses this idea, but first
|
|
rewrite <CODE>regNoun</CODE> so that the analysis needed to build <I>s</I>-forms
|
|
is factored out as a separate <CODE>oper</CODE>, which is shared with
|
|
<CODE>regVerb</CODE>.
|
|
<P></P>
|
|
<LI>Extend the verb paradigms to cover all verb forms
|
|
in English, with special care taken of variations with the suffix
|
|
<I>ed</I> (e.g. <I>try</I>-<I>tried</I>, <I>use</I>-<I>used</I>).
|
|
<P></P>
|
|
<LI>Implement the German <B>Umlaut</B> operation on word stems.
|
|
The operation changes the vowel of the stressed stem syllable as follows:
|
|
<I>a</I> to <I>ä</I>, <I>au</I> to <I>äu</I>, <I>o</I> to <I>ö</I>, and <I>u</I> to <I>ü</I>. You
|
|
can assume that the operation only takes syllables as arguments. Test the
|
|
operation to see whether it correctly changes <I>Arzt</I> to <I>Ärzt</I>,
|
|
<I>Baum</I> to <I>Bäum</I>, <I>Topf</I> to <I>Töpf</I>, and <I>Kuh</I> to <I>Küh</I>.
|
|
</OL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Function types with variables</H3>
|
|
<P>
|
|
In <a href="#chapsix">Lesson 5</a>, <B>dependent function types</B> need a notation
|
|
that binds a variable to the argument type, as in
|
|
</P>
|
|
<PRE>
|
|
switchOff : (k : Kind) -> Action k
|
|
</PRE>
|
|
<P>
|
|
Function types <I>without</I> variables are actually a shorthand:
|
|
</P>
|
|
<PRE>
|
|
PredVP : NP -> VP -> S
|
|
</PRE>
|
|
<P>
|
|
means
|
|
</P>
|
|
<PRE>
|
|
PredVP : (x : NP) -> (y : VP) -> S
|
|
</PRE>
|
|
<P>
|
|
or any other naming of the variables.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
Sometimes variables shorten the code, since they can share a type:
|
|
</P>
|
|
<PRE>
|
|
octuple : (x,y,z,u,v,w,s,t : Str) -> Str
|
|
</PRE>
|
|
<P>
|
|
If a bound variable is not used, it can be replaced by a wildcard:
|
|
</P>
|
|
<PRE>
|
|
octuple : (_,_,_,_,_,_,_,_ : Str) -> Str
|
|
</PRE>
|
|
<P>
|
|
A good practice is to indicate the number of arguments:
|
|
</P>
|
|
<PRE>
|
|
octuple : (x1,_,_,_,_,_,_,x8 : Str) -> Str
|
|
</PRE>
|
|
<P>
|
|
For inflection paradigms, it is handy to use heuristic variable names,
|
|
looking like the expected forms:
|
|
</P>
|
|
<PRE>
|
|
mkNoun : (mouse,mice : Str) -> Noun
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Separating operation types and definitions</H3>
|
|
<P>
|
|
In librarues, it is useful to group type signatures separately from
|
|
definitions. It is possible to divide an <CODE>oper</CODE> judgement,
|
|
</P>
|
|
<PRE>
|
|
oper regNoun : Str -> Noun ;
|
|
oper regNoun s = mkNoun s (s + "s") ;
|
|
</PRE>
|
|
<P>
|
|
and put the parts in different places.
|
|
</P>
|
|
<P>
|
|
With the <CODE>interface</CODE> and <CODE>instance</CODE> module types
|
|
(see <a href="#secinterface">here</a>): the parts can even be put to different files.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Overloading of operations</H3>
|
|
<P>
|
|
<B>Overloading</B>: different functions can be given the same name, as e.g. in C++.
|
|
</P>
|
|
<P>
|
|
The compiler performs <B>overload resolution</B>, which works as long as the
|
|
functions have different types.
|
|
</P>
|
|
<P>
|
|
In GF, the functions must be grouped together in <CODE>overload</CODE> groups.
|
|
</P>
|
|
<P>
|
|
Example: different ways to define nouns in English:
|
|
</P>
|
|
<PRE>
|
|
oper mkN : overload {
|
|
mkN : (dog : Str) -> Noun ; -- regular nouns
|
|
mkN : (mouse,mice : Str) -> Noun ; -- irregular nouns
|
|
}
|
|
</PRE>
|
|
<P>
|
|
Cf. dictionaries: if the
|
|
word is regular, just one form is needed. If it is irregular,
|
|
more forms are given.
|
|
</P>
|
|
<P>
|
|
The definition can be given separately, or at the same time, as the types:
|
|
</P>
|
|
<PRE>
|
|
oper mkN = overload {
|
|
mkN : (dog : Str) -> Noun = regNoun ;
|
|
mkN : (mouse,mice : Str) -> Noun = mkNoun ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
<B>Exercise</B>. Design a system of English verb paradigms presented by
|
|
an overload group.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Morphological analysis and morphology quiz</H3>
|
|
<P>
|
|
The command <CODE>morpho_analyse = ma</CODE>
|
|
can be used to read a text and return for each word its analyses
|
|
(in the current grammar):
|
|
</P>
|
|
<PRE>
|
|
> read_file bible.txt | morpho_analyse
|
|
</PRE>
|
|
<P>
|
|
The command <CODE>morpho_quiz = mq</CODE> generates inflection exercises.
|
|
</P>
|
|
<PRE>
|
|
% gf -path=alltenses:prelude $GF_LIB_PATH/alltenses/IrregFre.gfo
|
|
|
|
> morpho_quiz -cat=V
|
|
|
|
Welcome to GF Morphology Quiz.
|
|
...
|
|
|
|
réapparaître : VFin VCondit Pl P2
|
|
réapparaitriez
|
|
> No, not réapparaitriez, but
|
|
réapparaîtriez
|
|
Score 0/1
|
|
</PRE>
|
|
<P>
|
|
To create a list for later use, use the command <CODE>morpho_list = ml</CODE>
|
|
</P>
|
|
<PRE>
|
|
> morpho_list -number=25 -cat=V | write_file exx.txt
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>The Italian Foods grammar</H2>
|
|
<P>
|
|
<a name="secitalian"></a>
|
|
</P>
|
|
<P>
|
|
Parameters include not only number but also gender.
|
|
</P>
|
|
<PRE>
|
|
concrete FoodsIta of Foods = open Prelude in {
|
|
|
|
param
|
|
Number = Sg | Pl ;
|
|
Gender = Masc | Fem ;
|
|
</PRE>
|
|
<P>
|
|
Qualities are inflected for gender and number, whereas kinds
|
|
have a parametric number and an inherent gender.
|
|
Items have an inherent number and gender.
|
|
</P>
|
|
<PRE>
|
|
lincat
|
|
Phr = SS ;
|
|
Quality = {s : Gender => Number => Str} ;
|
|
Kind = {s : Number => Str ; g : Gender} ;
|
|
Item = {s : Str ; g : Gender ; n : Number} ;
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
A Quality is an adjective, with one form for each gender-number combination.
|
|
</P>
|
|
<PRE>
|
|
oper
|
|
adjective : (_,_,_,_ : Str) -> {s : Gender => Number => Str} =
|
|
\nero,nera,neri,nere -> {
|
|
s = table {
|
|
Masc => table {
|
|
Sg => nero ;
|
|
Pl => neri
|
|
} ;
|
|
Fem => table {
|
|
Sg => nera ;
|
|
Pl => nere
|
|
}
|
|
}
|
|
} ;
|
|
</PRE>
|
|
<P>
|
|
Regular adjectives work by adding endings to the stem.
|
|
</P>
|
|
<PRE>
|
|
regAdj : Str -> {s : Gender => Number => Str} = \nero ->
|
|
let ner = init nero
|
|
in adjective nero (ner + "a") (ner + "i") (ner + "e") ;
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
For noun inflection, we are happy to give the two forms and the gender
|
|
explicitly:
|
|
</P>
|
|
<PRE>
|
|
noun : Str -> Str -> Gender -> {s : Number => Str ; g : Gender} =
|
|
\vino,vini,g -> {
|
|
s = table {
|
|
Sg => vino ;
|
|
Pl => vini
|
|
} ;
|
|
g = g
|
|
} ;
|
|
</PRE>
|
|
<P>
|
|
We need only number variation for the copula.
|
|
</P>
|
|
<PRE>
|
|
copula : Number -> Str =
|
|
\n -> case n of {
|
|
Sg => "è" ;
|
|
Pl => "sono"
|
|
} ;
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
Determination is more complex than in English, because of gender:
|
|
</P>
|
|
<PRE>
|
|
det : Number -> Str -> Str -> {s : Number => Str ; g : Gender} ->
|
|
{s : Str ; g : Gender ; n : Number} =
|
|
\n,m,f,cn -> {
|
|
s = case cn.g of {Masc => m ; Fem => f} ++ cn.s ! n ;
|
|
g = cn.g ;
|
|
n = n
|
|
} ;
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
The complete set of linearization rules:
|
|
</P>
|
|
<PRE>
|
|
lin
|
|
Is item quality =
|
|
ss (item.s ++ copula item.n ++ quality.s ! item.g ! item.n) ;
|
|
This = det Sg "questo" "questa" ;
|
|
That = det Sg "quel" "quella" ;
|
|
These = det Pl "questi" "queste" ;
|
|
Those = det Pl "quei" "quelle" ;
|
|
QKind quality kind = {
|
|
s = \\n => kind.s ! n ++ quality.s ! kind.g ! n ;
|
|
g = kind.g
|
|
} ;
|
|
Wine = noun "vino" "vini" Masc ;
|
|
Cheese = noun "formaggio" "formaggi" Masc ;
|
|
Fish = noun "pesce" "pesci" Masc ;
|
|
Pizza = noun "pizza" "pizze" Fem ;
|
|
Very qual = {s = \\g,n => "molto" ++ qual.s ! g ! n} ;
|
|
Fresh = adjective "fresco" "fresca" "freschi" "fresche" ;
|
|
Warm = regAdj "caldo" ;
|
|
Italian = regAdj "italiano" ;
|
|
Expensive = regAdj "caro" ;
|
|
Delicious = regAdj "delizioso" ;
|
|
Boring = regAdj "noioso" ;
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Exercises on using parameters</H3>
|
|
<OL>
|
|
<LI>Experiment with multilingual generation and translation in the
|
|
<CODE>Foods</CODE> grammars.
|
|
<P></P>
|
|
<LI>Add items, qualities, and determiners to the grammar,
|
|
and try to get their inflection and inherent features right.
|
|
<P></P>
|
|
<LI>Write a concrete syntax of <CODE>Food</CODE> for a language of your choice,
|
|
now aiming for complete grammatical correctness by the use of parameters.
|
|
<P></P>
|
|
<LI>Measure the size of the context-free grammar corresponding to
|
|
<CODE>FoodsIta</CODE>. You can do this by printing the grammar in the context-free format
|
|
(<CODE>print_grammar -printer=bnf</CODE>) and counting the lines.
|
|
</OL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Discontinuous constituents</H2>
|
|
<P>
|
|
A linearization record may contain more strings than one, and those
|
|
strings can be put apart in linearization.
|
|
</P>
|
|
<P>
|
|
Example: English particle
|
|
verbs, (<I>switch off</I>). The object can appear between:
|
|
</P>
|
|
<P>
|
|
<I>he switched it off</I>
|
|
</P>
|
|
<P>
|
|
The verb <I>switch off</I> is called a
|
|
<B>discontinuous constituents</B>.
|
|
</P>
|
|
<P>
|
|
We can define transitive verbs and their combinations as follows:
|
|
</P>
|
|
<PRE>
|
|
lincat TV = {s : Number => Str ; part : Str} ;
|
|
|
|
fun AppTV : Item -> TV -> Item -> Phrase ;
|
|
|
|
lin AppTV subj tv obj =
|
|
{s = subj.s ++ tv.s ! subj.n ++ obj.s ++ tv.part} ;
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<B>Exercise</B>. Define the language <CODE>a^n b^n c^n</CODE> in GF, i.e.
|
|
any number of <I>a</I>'s followed by the same number of <I>b</I>'s and
|
|
the same number of <I>c</I>'s. This language is not context-free,
|
|
but can be defined in GF by using discontinuous constituents.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Strings at compile time vs. run time</H2>
|
|
<P>
|
|
Tokens are created in the following ways:
|
|
</P>
|
|
<UL>
|
|
<LI>quoted string: <CODE>"foo"</CODE>
|
|
<LI>gluing : <CODE>t + s</CODE>
|
|
<LI>predefined operations <CODE>init, tail, tk, dp</CODE>
|
|
<LI>pattern matching over strings
|
|
</UL>
|
|
|
|
<P>
|
|
Since <I>tokens must be known at compile time</I>,
|
|
the above operations may not be applied to <B>run-time variables</B>
|
|
(i.e. variables that stand for function arguments in linearization rules).
|
|
</P>
|
|
<P>
|
|
Hence it is not legal to write
|
|
</P>
|
|
<PRE>
|
|
cat Noun ;
|
|
fun Plural : Noun -> Noun ;
|
|
lin Plural n = {s = n.s + "s"} ;
|
|
</PRE>
|
|
<P>
|
|
because <CODE>n</CODE> is a run-time variable. Also
|
|
</P>
|
|
<PRE>
|
|
lin Plural n = {s = (regNoun n).s ! Pl} ;
|
|
</PRE>
|
|
<P>
|
|
is incorrect with <CODE>regNoun</CODE> as defined <a href="#secinflection">here</a>, because the run-time
|
|
variable is eventually sent to string pattern matching and gluing.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
How to write tokens together without a space?
|
|
</P>
|
|
<PRE>
|
|
lin Question p = {s = p + "?"} ;
|
|
</PRE>
|
|
<P>
|
|
is incorrect.
|
|
</P>
|
|
<P>
|
|
The way to go is to use an <B>unlexer</B> that creates correct spacing
|
|
after linearization.
|
|
</P>
|
|
<P>
|
|
Correspondingly, a <B>lexer</B> that e.g. analyses <CODE>"warm?"</CODE> into
|
|
to tokens is needed before parsing.
|
|
This topic will be covered in <a href="#seclexing">here</a>.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Supplementary constructs for concrete syntax</H3>
|
|
<H4>Record extension and subtyping</H4>
|
|
<P>
|
|
The symbol <CODE>**</CODE> is used for both record types and record objects.
|
|
</P>
|
|
<PRE>
|
|
lincat TV = Verb ** {c : Case} ;
|
|
|
|
lin Follow = regVerb "folgen" ** {c = Dative} ;
|
|
</PRE>
|
|
<P>
|
|
<CODE>TV</CODE> becomes a <B>subtype</B> of <CODE>Verb</CODE>.
|
|
</P>
|
|
<P>
|
|
If <I>T</I> is a subtype of <I>R</I>, an object of <I>T</I> can be used whenever
|
|
an object of <I>R</I> is required.
|
|
</P>
|
|
<P>
|
|
<B>Covariance</B>: a function returning a record <I>T</I> as value can
|
|
also be used to return a value of a supertype <I>R</I>.
|
|
</P>
|
|
<P>
|
|
<B>Contravariance</B>: a function taking an <I>R</I> as argument
|
|
can also be applied to any object of a subtype <I>T</I>.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H4>Tuples and product types</H4>
|
|
<P>
|
|
Product types and tuples are syntactic sugar for record types and records:
|
|
</P>
|
|
<PRE>
|
|
T1 * ... * Tn === {p1 : T1 ; ... ; pn : Tn}
|
|
<t1, ..., tn> === {p1 = T1 ; ... ; pn = Tn}
|
|
</PRE>
|
|
<P>
|
|
Thus the labels <CODE>p1, p2,...</CODE> are hard-coded.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H4>Prefix-dependent choices</H4>
|
|
<P>
|
|
English indefinite article:
|
|
</P>
|
|
<PRE>
|
|
oper artIndef : Str =
|
|
pre {"a" ; "an" / strs {"a" ; "e" ; "i" ; "o"}} ;
|
|
</PRE>
|
|
<P>
|
|
Thus
|
|
</P>
|
|
<PRE>
|
|
artIndef ++ "cheese" ---> "a" ++ "cheese"
|
|
artIndef ++ "apple" ---> "an" ++ "apple"
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H1>Lesson 4: Using the resource grammar library</H1>
|
|
<P>
|
|
<a name="chapfive"></a>
|
|
</P>
|
|
<P>
|
|
Goals:
|
|
</P>
|
|
<UL>
|
|
<LI>navigate in the GF resource grammar library and use it in applications
|
|
<LI>get acquainted with basic linguistic categories
|
|
<LI>write functors to achieve maximal sharing of code in multilingual grammars
|
|
</UL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>The coverage of the library</H2>
|
|
<P>
|
|
The current 12 resource languages are
|
|
</P>
|
|
<UL>
|
|
<LI><CODE>Bul</CODE>garian
|
|
<LI><CODE>Cat</CODE>alan
|
|
<LI><CODE>Dan</CODE>ish
|
|
<LI><CODE>Eng</CODE>lish
|
|
<LI><CODE>Fin</CODE>nish
|
|
<LI><CODE>Fre</CODE>nch
|
|
<LI><CODE>Ger</CODE>man
|
|
<LI><CODE>Ita</CODE>lian
|
|
<LI><CODE>Nor</CODE>wegian
|
|
<LI><CODE>Rus</CODE>sian
|
|
<LI><CODE>Spa</CODE>nish
|
|
<LI><CODE>Swe</CODE>dish
|
|
</UL>
|
|
|
|
<P>
|
|
The first three letters (<CODE>Eng</CODE> etc) are used in grammar module names
|
|
(ISO 639 standard).
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>The structure of the library</H2>
|
|
<P>
|
|
<a name="seclexical"></a>
|
|
</P>
|
|
<P>
|
|
Semantic grammars (up to now in this tutorial):
|
|
a grammar defines a system of meanings (abstract syntax) and
|
|
tells how they are expressed(concrete syntax).
|
|
</P>
|
|
<P>
|
|
Resource grammars (as usual in linguistic tradition):
|
|
a grammar specifies the <B>grammatically correct combinations of words</B>,
|
|
whatever their meanings are.
|
|
</P>
|
|
<P>
|
|
With resource grammars, we can achieve a
|
|
wider coverage than with semantic grammars.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Lexical vs. phrasal rules</H3>
|
|
<P>
|
|
A resource grammar has two kinds of categories and two kinds of rules:
|
|
</P>
|
|
<UL>
|
|
<LI>lexical:
|
|
<UL>
|
|
<LI>lexical categories, to classify words
|
|
<LI>lexical rules, to define words and their properties
|
|
<P></P>
|
|
</UL>
|
|
<LI>phrasal (combinatorial, syntactic):
|
|
<UL>
|
|
<LI>phrasal categories, to classify phrases of arbitrary size
|
|
<LI>phrasal rules, to combine phrases into larger phrases
|
|
</UL>
|
|
</UL>
|
|
|
|
<P>
|
|
GE makes no formal distinction between these two kinds.
|
|
</P>
|
|
<P>
|
|
But it is a good discipline to follow.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Lexical categories</H3>
|
|
<P>
|
|
Two kinds of lexical categories:
|
|
</P>
|
|
<UL>
|
|
<LI><B>closed</B>:
|
|
<UL>
|
|
<LI>a finite number of words
|
|
<LI>seldom extended in the history of language
|
|
<LI>structural words / function words, e.g.
|
|
<PRE>
|
|
Conj ; -- conjunction e.g. "and"
|
|
QuantSg ; -- singular quantifier e.g. "this"
|
|
QuantPl ; -- plural quantifier e.g. "this"
|
|
</PRE>
|
|
<P></P>
|
|
</UL>
|
|
<LI><B>open</B>:
|
|
<UL>
|
|
<LI>new words are added all the time
|
|
<LI>content words, e.g.
|
|
<PRE>
|
|
N ; -- noun e.g. "pizza"
|
|
A ; -- adjective e.g. "good"
|
|
V ; -- verb e.g. "sleep"
|
|
</PRE>
|
|
</UL>
|
|
</UL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Lexical rules</H3>
|
|
<P>
|
|
Closed classes: module <CODE>Syntax</CODE>. In the <CODE>Foods</CODE> grammar, we need
|
|
</P>
|
|
<PRE>
|
|
this_QuantSg, that_QuantSg : QuantSg ;
|
|
these_QuantPl, those_QuantPl : QuantPl ;
|
|
very_AdA : AdA ;
|
|
</PRE>
|
|
<P>
|
|
Naming convention: word followed by the category (so we can
|
|
distinguish the quantifier <I>that</I> from the conjunction <I>that</I>).
|
|
</P>
|
|
<P>
|
|
Open classes have no objects in <CODE>Syntax</CODE>. Words are
|
|
built as they are needed in applications: if we have
|
|
</P>
|
|
<PRE>
|
|
fun Wine : Kind ;
|
|
</PRE>
|
|
<P>
|
|
we will define
|
|
</P>
|
|
<PRE>
|
|
lin Wine = mkN "wine" ;
|
|
</PRE>
|
|
<P>
|
|
where we use <CODE>mkN</CODE> from <CODE>ParadigmsEng</CODE>:
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Resource lexicon</H3>
|
|
<P>
|
|
Alternative concrete syntax for
|
|
</P>
|
|
<PRE>
|
|
fun Wine : Kind ;
|
|
</PRE>
|
|
<P>
|
|
is to provide a <B>resource lexicon</B>, which contains definitions such as
|
|
</P>
|
|
<PRE>
|
|
oper wine_N : N = mkN "wine" ;
|
|
</PRE>
|
|
<P>
|
|
so that we can write
|
|
</P>
|
|
<PRE>
|
|
lin Wine = wine_N ;
|
|
</PRE>
|
|
<P>
|
|
Advantages:
|
|
</P>
|
|
<UL>
|
|
<LI>we accumulate a reusable lexicon
|
|
<LI>we can use a <a href="#secfunctor">here</a> to speed up multilingual grammar implementation
|
|
</UL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Phrasal categories</H3>
|
|
<P>
|
|
In <CODE>Foods</CODE>, we need just four phrasal categories:
|
|
</P>
|
|
<PRE>
|
|
Cl ; -- clause e.g. "this pizza is good"
|
|
NP ; -- noun phrase e.g. "this pizza"
|
|
CN ; -- common noun e.g. "warm pizza"
|
|
AP ; -- adjectival phrase e.g. "very warm"
|
|
</PRE>
|
|
<P>
|
|
Clauses are similar to sentences (<CODE>S</CODE>), but without a
|
|
fixed tense and mood; see <a href="#secextended">here</a> for how they relate.
|
|
</P>
|
|
<P>
|
|
Common nouns are made into noun phrases by adding determiners.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Syntactic combinations</H3>
|
|
<P>
|
|
We need the following combinations:
|
|
</P>
|
|
<PRE>
|
|
mkCl : NP -> AP -> Cl ; -- e.g. "this pizza is very warm"
|
|
mkNP : QuantSg -> CN -> NP ; -- e.g. "this pizza"
|
|
mkNP : QuantPl -> CN -> NP ; -- e.g. "these pizzas"
|
|
mkCN : AP -> CN -> CN ; -- e.g. "warm pizza"
|
|
mkAP : AdA -> AP -> AP ; -- e.g. "very warm"
|
|
</PRE>
|
|
<P>
|
|
We also need <B>lexical insertion</B>, to form phrases from single words:
|
|
</P>
|
|
<PRE>
|
|
mkCN : N -> NP ;
|
|
mkAP : A -> AP ;
|
|
</PRE>
|
|
<P>
|
|
Naming convention: to construct a <I>C</I>, use a function <CODE>mk</CODE><I>C</I>.
|
|
</P>
|
|
<P>
|
|
Heavy overloading: the current library
|
|
(version 1.2) has 23 operations named <CODE>mkNP</CODE>!
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Example syntactic combination</H3>
|
|
<P>
|
|
The sentence
|
|
<center>
|
|
<I>these very warm pizzas are Italian</I>
|
|
</center>
|
|
can be built as follows:
|
|
</P>
|
|
<PRE>
|
|
mkCl
|
|
(mkNP these_QuantPl
|
|
(mkCN (mkAP very_AdA (mkAP warm_A)) (mkCN pizza_CN)))
|
|
(mkAP italian_AP)
|
|
</PRE>
|
|
<P>
|
|
The task now: to define the concrete syntax of <CODE>Foods</CODE> so that
|
|
this syntactic tree gives the value of linearizing the semantic tree
|
|
</P>
|
|
<PRE>
|
|
Is (These (QKind (Very Warm) Pizza)) Italian
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>The resource API</H2>
|
|
<P>
|
|
Language-specific and language-independent parts - roughly,
|
|
</P>
|
|
<UL>
|
|
<LI>the syntax API <CODE>Syntax</CODE><I>L</I> has the same types and
|
|
functions for all languages <I>L</I>
|
|
<LI>the morphology API <CODE>Paradigms</CODE><I>L</I> has partly
|
|
different types and functions
|
|
for different languages <I>L</I>
|
|
</UL>
|
|
|
|
<P>
|
|
Full API documentation on-line: the <B>resource synopsis</B>,
|
|
</P>
|
|
<P>
|
|
<A HREF="http://grammaticalframework.org/lib/doc/synopsis.html"><CODE>grammaticalframework.org/lib/resource/doc/synopsis.html</CODE></A>
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>A miniature resource API: categories</H3>
|
|
<TABLE CELLPADDING="4" BORDER="1">
|
|
<TR>
|
|
<TH>Category</TH>
|
|
<TH>Explanation</TH>
|
|
<TH COLSPAN="2">Example</TH>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>Cl</CODE></TD>
|
|
<TD>clause (sentence), with all tenses</TD>
|
|
<TD><I>she looks at this</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>AP</CODE></TD>
|
|
<TD>adjectival phrase</TD>
|
|
<TD><I>very warm</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>CN</CODE></TD>
|
|
<TD>common noun (without determiner)</TD>
|
|
<TD><I>red house</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>NP</CODE></TD>
|
|
<TD>noun phrase (subject or object)</TD>
|
|
<TD><I>the red house</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>AdA</CODE></TD>
|
|
<TD>adjective-modifying adverb,</TD>
|
|
<TD><I>very</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>QuantSg</CODE></TD>
|
|
<TD>singular quantifier</TD>
|
|
<TD><I>these</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>QuantPl</CODE></TD>
|
|
<TD>plural quantifier</TD>
|
|
<TD><I>this</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>A</CODE></TD>
|
|
<TD>one-place adjective</TD>
|
|
<TD><I>warm</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>N</CODE></TD>
|
|
<TD>common noun</TD>
|
|
<TD><I>house</I></TD>
|
|
</TR>
|
|
</TABLE>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>A miniature resource API: rules</H3>
|
|
<TABLE CELLPADDING="4" BORDER="1">
|
|
<TR>
|
|
<TH>Function</TH>
|
|
<TH>Type</TH>
|
|
<TH COLSPAN="2">Example</TH>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkCl</CODE></TD>
|
|
<TD><CODE>NP -> AP -> Cl</CODE></TD>
|
|
<TD><I>John is very old</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkNP</CODE></TD>
|
|
<TD><CODE>QuantSg -> CN -> NP</CODE></TD>
|
|
<TD><I>this old man</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkNP</CODE></TD>
|
|
<TD><CODE>QuantPl -> CN -> NP</CODE></TD>
|
|
<TD><I>these old man</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkCN</CODE></TD>
|
|
<TD><CODE>N -> CN</CODE></TD>
|
|
<TD><I>house</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkCN</CODE></TD>
|
|
<TD><CODE>AP -> CN -> CN</CODE></TD>
|
|
<TD><I>very big blue house</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkAP</CODE></TD>
|
|
<TD><CODE>A -> AP</CODE></TD>
|
|
<TD><I>old</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkAP</CODE></TD>
|
|
<TD><CODE>AdA -> AP -> AP</CODE></TD>
|
|
<TD><I>very very old</I></TD>
|
|
</TR>
|
|
</TABLE>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>A miniature resource API: structural words</H3>
|
|
<TABLE CELLPADDING="4" BORDER="1">
|
|
<TR>
|
|
<TH>Function</TH>
|
|
<TH>Type</TH>
|
|
<TH COLSPAN="2">In English</TH>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>this_QuantSg</CODE></TD>
|
|
<TD><CODE>QuantSg</CODE></TD>
|
|
<TD><I>this</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>that_QuantSg</CODE></TD>
|
|
<TD><CODE>QuantSg</CODE></TD>
|
|
<TD><I>that</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>these_QuantPl</CODE></TD>
|
|
<TD><CODE>QuantPl</CODE></TD>
|
|
<TD><I>this</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>those_QuantPl</CODE></TD>
|
|
<TD><CODE>QuantPl</CODE></TD>
|
|
<TD><I>that</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>very_AdA</CODE></TD>
|
|
<TD><CODE>AdA</CODE></TD>
|
|
<TD><I>very</I></TD>
|
|
</TR>
|
|
</TABLE>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>A miniature resource API: paradigms</H3>
|
|
<P>
|
|
From <CODE>ParadigmsEng</CODE>:
|
|
</P>
|
|
<TABLE CELLPADDING="4" BORDER="1">
|
|
<TR>
|
|
<TH>Function</TH>
|
|
<TH COLSPAN="2">Type</TH>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkN</CODE></TD>
|
|
<TD><CODE>(dog : Str) -> N</CODE></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkN</CODE></TD>
|
|
<TD><CODE>(man,men : Str) -> N</CODE></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkA</CODE></TD>
|
|
<TD><CODE>(cold : Str) -> A</CODE></TD>
|
|
</TR>
|
|
</TABLE>
|
|
|
|
<P>
|
|
From <CODE>ParadigmsIta</CODE>:
|
|
</P>
|
|
<TABLE CELLPADDING="4" BORDER="1">
|
|
<TR>
|
|
<TH>Function</TH>
|
|
<TH COLSPAN="2">Type</TH>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkN</CODE></TD>
|
|
<TD><CODE>(vino : Str) -> N</CODE></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkA</CODE></TD>
|
|
<TD><CODE>(caro : Str) -> A</CODE></TD>
|
|
</TR>
|
|
</TABLE>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>A miniature resource API: more paradigms</H3>
|
|
<P>
|
|
From <CODE>ParadigmsGer</CODE>:
|
|
</P>
|
|
<TABLE CELLPADDING="4" BORDER="1">
|
|
<TR>
|
|
<TH>Function</TH>
|
|
<TH COLSPAN="2">Type</TH>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>Gender</CODE></TD>
|
|
<TD><CODE>Type</CODE></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>masculine</CODE></TD>
|
|
<TD><CODE>Gender</CODE></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>feminine</CODE></TD>
|
|
<TD><CODE>Gender</CODE></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>neuter</CODE></TD>
|
|
<TD><CODE>Gender</CODE></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkN</CODE></TD>
|
|
<TD><CODE>(Stufe : Str) -> N</CODE></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkN</CODE></TD>
|
|
<TD><CODE>(Bild,Bilder : Str) -> Gender -> N</CODE></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkA</CODE></TD>
|
|
<TD><CODE>(klein : Str) -> A</CODE></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkA</CODE></TD>
|
|
<TD><CODE>(gut,besser,beste : Str) -> A</CODE></TD>
|
|
</TR>
|
|
</TABLE>
|
|
|
|
<P>
|
|
From <CODE>ParadigmsFin</CODE>:
|
|
</P>
|
|
<TABLE CELLPADDING="4" BORDER="1">
|
|
<TR>
|
|
<TH>Function</TH>
|
|
<TH COLSPAN="2">Type</TH>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkN</CODE></TD>
|
|
<TD><CODE>(talo : Str) -> N</CODE></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkA</CODE></TD>
|
|
<TD><CODE>(hieno : Str) -> A</CODE></TD>
|
|
</TR>
|
|
</TABLE>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Exercises</H3>
|
|
<P>
|
|
1. Try out the morphological paradigms in different languages. Do
|
|
as follows:
|
|
</P>
|
|
<PRE>
|
|
> i -path=alltenses -retain alltenses/ParadigmsGer.gfo
|
|
> cc -table mkN "Farbe"
|
|
> cc -table mkA "gut" "besser" "beste"
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Example: English</H2>
|
|
<P>
|
|
<a name="secenglish"></a>
|
|
</P>
|
|
<P>
|
|
We assume the abstract syntax <CODE>Foods</CODE> from <a href="#chaptwo">Lesson 3</a>.
|
|
</P>
|
|
<P>
|
|
We don't need to think about inflection and agreement, but just pick
|
|
functions from the resource grammar library.
|
|
</P>
|
|
<P>
|
|
We need a path with
|
|
</P>
|
|
<UL>
|
|
<LI>the current directory <CODE>.</CODE>
|
|
<LI>the directory <CODE>../foods</CODE>, in which <CODE>Foods.gf</CODE> resides.
|
|
<LI>the library directory <CODE>present</CODE>, which is relative to the
|
|
environment variable <CODE>GF_LIB_PATH</CODE>
|
|
</UL>
|
|
|
|
<P>
|
|
Thus the beginning of the module is
|
|
</P>
|
|
<PRE>
|
|
--# -path=.:../foods:present
|
|
|
|
concrete FoodsEng of Foods = open SyntaxEng,ParadigmsEng in {
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>English example: linearization types and combination rules</H3>
|
|
<P>
|
|
As linearization types, we use clauses for <CODE>Phrase</CODE>, noun phrases
|
|
for <CODE>Item</CODE>, common nouns for <CODE>Kind</CODE>, and adjectival phrases for <CODE>Quality</CODE>.
|
|
</P>
|
|
<PRE>
|
|
lincat
|
|
Phrase = Cl ;
|
|
Item = NP ;
|
|
Kind = CN ;
|
|
Quality = AP ;
|
|
</PRE>
|
|
<P>
|
|
Now the combination rules we need almost write themselves automatically:
|
|
</P>
|
|
<PRE>
|
|
lin
|
|
Is item quality = mkCl item quality ;
|
|
This kind = mkNP this_QuantSg kind ;
|
|
That kind = mkNP that_QuantSg kind ;
|
|
These kind = mkNP these_QuantPl kind ;
|
|
Those kind = mkNP those_QuantPl kind ;
|
|
QKind quality kind = mkCN quality kind ;
|
|
Very quality = mkAP very_AdA quality ;
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>English example: lexical rules</H3>
|
|
<P>
|
|
We use resource paradigms and lexical insertion rules.
|
|
</P>
|
|
<P>
|
|
The two-place noun paradigm is needed only once, for
|
|
<I>fish</I> - everythins else is regular.
|
|
</P>
|
|
<PRE>
|
|
Wine = mkCN (mkN "wine") ;
|
|
Pizza = mkCN (mkN "pizza") ;
|
|
Cheese = mkCN (mkN "cheese") ;
|
|
Fish = mkCN (mkN "fish" "fish") ;
|
|
Fresh = mkAP (mkA "fresh") ;
|
|
Warm = mkAP (mkA "warm") ;
|
|
Italian = mkAP (mkA "Italian") ;
|
|
Expensive = mkAP (mkA "expensive") ;
|
|
Delicious = mkAP (mkA "delicious") ;
|
|
Boring = mkAP (mkA "boring") ;
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>English example: exercises</H3>
|
|
<P>
|
|
1. Compile the grammar <CODE>FoodsEng</CODE> and generate
|
|
and parse some sentences.
|
|
</P>
|
|
<P>
|
|
2. Write a concrete syntax of <CODE>Foods</CODE> for Italian
|
|
or some other language included in the resource library. You can
|
|
compare the results with the hand-written
|
|
grammars presented earlier in this tutorial.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Functor implementation of multilingual grammars</H2>
|
|
<P>
|
|
<a name="secfunctor"></a>
|
|
</P>
|
|
<H3>New language by copy and paste</H3>
|
|
<P>
|
|
If you write a concrete syntax of <CODE>Foods</CODE> for some other
|
|
language, much of the code will look exactly the same
|
|
as for English. This is because
|
|
</P>
|
|
<UL>
|
|
<LI>the <CODE>Syntax</CODE> API is the same for all languages (because
|
|
all languages in the resource package do implement the same
|
|
syntactic structures)
|
|
<LI>languages tend to use the syntactic structures in similar ways
|
|
</UL>
|
|
|
|
<P>
|
|
But lexical rules are more language-dependent.
|
|
</P>
|
|
<P>
|
|
Thus, to port a grammar to a new language, you
|
|
</P>
|
|
<OL>
|
|
<LI>copy the concrete syntax of a given language
|
|
<LI>change the words (strings and inflection paradigms)
|
|
</OL>
|
|
|
|
<P>
|
|
Can we avoid this programming by copy-and-paste?
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Functors: functions on the module level</H3>
|
|
<P>
|
|
<B>Functors</B> familiar from the functional programming languages ML and OCaml,
|
|
also known as <B>parametrized modules</B>.
|
|
</P>
|
|
<P>
|
|
In GF, a functor is a module that <CODE>open</CODE>s one or more <B>interfaces</B>.
|
|
</P>
|
|
<P>
|
|
An <CODE>interface</CODE> is a module similar to a <CODE>resource</CODE>, but it only
|
|
contains the <I>types</I> of <CODE>oper</CODE>s, not (necessarily) their definitions.
|
|
</P>
|
|
<P>
|
|
Syntax for functors: add the keyword <CODE>incomplete</CODE>. We will use the header
|
|
</P>
|
|
<PRE>
|
|
incomplete concrete FoodsI of Foods = open Syntax, LexFoods in
|
|
</PRE>
|
|
<P>
|
|
where
|
|
</P>
|
|
<PRE>
|
|
interface Syntax -- the resource grammar interface
|
|
interface LexFoods -- the domain lexicon interface
|
|
</PRE>
|
|
<P>
|
|
When we moreover have
|
|
</P>
|
|
<PRE>
|
|
instance SyntaxEng of Syntax -- the English resource grammar
|
|
instance LexFoodsEng of LexFoods -- the English domain lexicon
|
|
</PRE>
|
|
<P>
|
|
we can write a <B>functor instantiation</B>,
|
|
</P>
|
|
<PRE>
|
|
concrete FoodsGer of Foods = FoodsI with
|
|
(Syntax = SyntaxGer),
|
|
(LexFoods = LexFoodsGer) ;
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Code for the Foods functor</H3>
|
|
<PRE>
|
|
--# -path=.:../foods
|
|
|
|
incomplete concrete FoodsI of Foods = open Syntax, LexFoods in {
|
|
lincat
|
|
Phrase = Cl ;
|
|
Item = NP ;
|
|
Kind = CN ;
|
|
Quality = AP ;
|
|
lin
|
|
Is item quality = mkCl item quality ;
|
|
This kind = mkNP this_QuantSg kind ;
|
|
That kind = mkNP that_QuantSg kind ;
|
|
These kind = mkNP these_QuantPl kind ;
|
|
Those kind = mkNP those_QuantPl kind ;
|
|
QKind quality kind = mkCN quality kind ;
|
|
Very quality = mkAP very_AdA quality ;
|
|
|
|
Wine = mkCN wine_N ;
|
|
Pizza = mkCN pizza_N ;
|
|
Cheese = mkCN cheese_N ;
|
|
Fish = mkCN fish_N ;
|
|
Fresh = mkAP fresh_A ;
|
|
Warm = mkAP warm_A ;
|
|
Italian = mkAP italian_A ;
|
|
Expensive = mkAP expensive_A ;
|
|
Delicious = mkAP delicious_A ;
|
|
Boring = mkAP boring_A ;
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Code for the LexFoods interface</H3>
|
|
<P>
|
|
<a name="secinterface"></a>
|
|
</P>
|
|
<PRE>
|
|
interface LexFoods = open Syntax in {
|
|
oper
|
|
wine_N : N ;
|
|
pizza_N : N ;
|
|
cheese_N : N ;
|
|
fish_N : N ;
|
|
fresh_A : A ;
|
|
warm_A : A ;
|
|
italian_A : A ;
|
|
expensive_A : A ;
|
|
delicious_A : A ;
|
|
boring_A : A ;
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Code for a German instance of the lexicon</H3>
|
|
<PRE>
|
|
instance LexFoodsGer of LexFoods = open SyntaxGer, ParadigmsGer in {
|
|
oper
|
|
wine_N = mkN "Wein" ;
|
|
pizza_N = mkN "Pizza" "Pizzen" feminine ;
|
|
cheese_N = mkN "Käse" "Käsen" masculine ;
|
|
fish_N = mkN "Fisch" ;
|
|
fresh_A = mkA "frisch" ;
|
|
warm_A = mkA "warm" "wärmer" "wärmste" ;
|
|
italian_A = mkA "italienisch" ;
|
|
expensive_A = mkA "teuer" ;
|
|
delicious_A = mkA "köstlich" ;
|
|
boring_A = mkA "langweilig" ;
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Code for a German functor instantiation</H3>
|
|
<PRE>
|
|
--# -path=.:../foods:present
|
|
|
|
concrete FoodsGer of Foods = FoodsI with
|
|
(Syntax = SyntaxGer),
|
|
(LexFoods = LexFoodsGer) ;
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Adding languages to a functor implementation</H3>
|
|
<P>
|
|
Just two modules are needed:
|
|
</P>
|
|
<UL>
|
|
<LI>a domain lexicon instance
|
|
<LI>a functor instantiation
|
|
</UL>
|
|
|
|
<P>
|
|
The functor instantiation is completely mechanical to write.
|
|
</P>
|
|
<P>
|
|
The domain lexicon instance requires some knowledge of the words of the
|
|
language:
|
|
</P>
|
|
<UL>
|
|
<LI>what words are used for which concepts
|
|
<LI>how the words are
|
|
<LI>features such as genders
|
|
</UL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Example: adding Finnish</H3>
|
|
<P>
|
|
Lexicon instance
|
|
</P>
|
|
<PRE>
|
|
instance LexFoodsFin of LexFoods = open SyntaxFin, ParadigmsFin in {
|
|
oper
|
|
wine_N = mkN "viini" ;
|
|
pizza_N = mkN "pizza" ;
|
|
cheese_N = mkN "juusto" ;
|
|
fish_N = mkN "kala" ;
|
|
fresh_A = mkA "tuore" ;
|
|
warm_A = mkA "lämmin" ;
|
|
italian_A = mkA "italialainen" ;
|
|
expensive_A = mkA "kallis" ;
|
|
delicious_A = mkA "herkullinen" ;
|
|
boring_A = mkA "tylsä" ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
Functor instantiation
|
|
</P>
|
|
<PRE>
|
|
--# -path=.:../foods:present
|
|
|
|
concrete FoodsFin of Foods = FoodsI with
|
|
(Syntax = SyntaxFin),
|
|
(LexFoods = LexFoodsFin) ;
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>A design pattern</H3>
|
|
<P>
|
|
This can be seen as a <I>design pattern</I> for multilingual grammars:
|
|
</P>
|
|
<PRE>
|
|
concrete DomainL*
|
|
|
|
instance LexDomainL instance SyntaxL*
|
|
|
|
incomplete concrete DomainI
|
|
/ | \
|
|
interface LexDomain abstract Domain interface Syntax*
|
|
</PRE>
|
|
<P>
|
|
Modules marked with <CODE>*</CODE> are either given in the library, or trivial.
|
|
</P>
|
|
<P>
|
|
Of the hand-written modules, only <CODE>LexDomainL</CODE> is language-dependent.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Functors: exercises</H3>
|
|
<P>
|
|
1. Compile and test <CODE>FoodsGer</CODE>.
|
|
</P>
|
|
<P>
|
|
2. Refactor <CODE>FoodsEng</CODE> into a functor instantiation.
|
|
</P>
|
|
<P>
|
|
3. Instantiate the functor <CODE>FoodsI</CODE> to some language of
|
|
your choice.
|
|
</P>
|
|
<P>
|
|
4. Design a small grammar that can be used for controlling
|
|
an MP3 player. The grammar should be able to recognize commands such
|
|
as <I>play this song</I>, with the following variations:
|
|
</P>
|
|
<UL>
|
|
<LI>verbs: <I>play</I>, <I>remove</I>
|
|
<LI>objects: <I>song</I>, <I>artist</I>
|
|
<LI>determiners: <I>this</I>, <I>the previous</I>
|
|
<LI>verbs without arguments: <I>stop</I>, <I>pause</I>
|
|
</UL>
|
|
|
|
<P>
|
|
The implementation goes in the following phases:
|
|
</P>
|
|
<OL>
|
|
<LI>abstract syntax
|
|
<LI>(optional:) prototype string-based concrete syntax
|
|
<LI>functor over resource syntax and lexicon interface
|
|
<LI>lexicon instance for the first language
|
|
<LI>functor instantiation for the first language
|
|
<LI>lexicon instance for the second language
|
|
<LI>functor instantiation for the second language
|
|
<LI>...
|
|
</OL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Restricted inheritance</H2>
|
|
<H3>A problem with functors</H3>
|
|
<P>
|
|
Problem: a functor only works when all languages use the resource <CODE>Syntax</CODE>
|
|
in the same way.
|
|
</P>
|
|
<P>
|
|
Example (contrived): assume that English has
|
|
no word for <CODE>Pizza</CODE>, but has to use the paraphrase <I>Italian pie</I>.
|
|
This is no longer a noun <CODE>N</CODE>, but a complex phrase
|
|
in the category <CODE>CN</CODE>.
|
|
</P>
|
|
<P>
|
|
Possible solution: change interface the <CODE>LexFoods</CODE> with
|
|
</P>
|
|
<PRE>
|
|
oper pizza_CN : CN ;
|
|
</PRE>
|
|
<P>
|
|
Problem with this solution:
|
|
</P>
|
|
<UL>
|
|
<LI>we may end up changing the interface and the function with each new language
|
|
<LI>we must every time also change the instances for the old languages to maintain
|
|
type correctness
|
|
</UL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Restricted inheritance: include or exclude</H3>
|
|
<P>
|
|
A module may inherit just a selection of names.
|
|
</P>
|
|
<P>
|
|
Example: the <CODE>FoodMarket</CODE> example "Rsecarchitecture:
|
|
</P>
|
|
<PRE>
|
|
abstract Foodmarket = Food, Fruit [Peach], Mushroom - [Agaric]
|
|
</PRE>
|
|
<P>
|
|
Here, from <CODE>Fruit</CODE> we include <CODE>Peach</CODE> only, and from <CODE>Mushroom</CODE>
|
|
we exclude <CODE>Agaric</CODE>.
|
|
</P>
|
|
<P>
|
|
A concrete syntax of <CODE>Foodmarket</CODE> must make the analogous restrictions.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>The functor problem solved</H3>
|
|
<P>
|
|
The English instantiation inherits the functor
|
|
implementation except for the constant <CODE>Pizza</CODE>. This constant
|
|
is defined in the body instead:
|
|
</P>
|
|
<PRE>
|
|
--# -path=.:../foods:present
|
|
|
|
concrete FoodsEng of Foods = FoodsI - [Pizza] with
|
|
(Syntax = SyntaxEng),
|
|
(LexFoods = LexFoodsEng) **
|
|
open SyntaxEng, ParadigmsEng in {
|
|
|
|
lin Pizza = mkCN (mkA "Italian") (mkN "pie") ;
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Grammar reuse</H2>
|
|
<P>
|
|
Abstract syntax modules can be used as interfaces,
|
|
and concrete syntaxes as their instances.
|
|
</P>
|
|
<P>
|
|
The following correspondencies are then applied:
|
|
</P>
|
|
<PRE>
|
|
cat C <---> oper C : Type
|
|
|
|
fun f : A <---> oper f : A
|
|
|
|
lincat C = T <---> oper C : Type = T
|
|
|
|
lin f = t <---> oper f : A = t
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Library exercises</H3>
|
|
<P>
|
|
1. Find resource grammar terms for the following
|
|
English phrases (in the category <CODE>Phr</CODE>). You can first try to
|
|
build the terms manually.
|
|
</P>
|
|
<P>
|
|
<I>every man loves a woman</I>
|
|
</P>
|
|
<P>
|
|
<I>this grammar speaks more than ten languages</I>
|
|
</P>
|
|
<P>
|
|
<I>which languages aren't in the grammar</I>
|
|
</P>
|
|
<P>
|
|
<I>which languages did you want to speak</I>
|
|
</P>
|
|
<P>
|
|
Then translate the phrases to other languages.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Tenses</H2>
|
|
<P>
|
|
<a name="sectense"></a>
|
|
</P>
|
|
<P>
|
|
In <CODE>Foods</CODE> grammars, we have used the path
|
|
</P>
|
|
<PRE>
|
|
--# -path=.:../foods
|
|
</PRE>
|
|
<P>
|
|
The library subdirectory <CODE>present</CODE> is a restricted version
|
|
of the resource, with only present tense of verbs and sentences.
|
|
</P>
|
|
<P>
|
|
By just changing the path, we get all tenses:
|
|
</P>
|
|
<PRE>
|
|
--# -path=.:../foods:alltenses
|
|
</PRE>
|
|
<P>
|
|
Now we can see all the tenses of phrases, by using the <CODE>-all</CODE> flag
|
|
in linearization:
|
|
</P>
|
|
<PRE>
|
|
> gr | l -all
|
|
This wine is delicious
|
|
Is this wine delicious
|
|
This wine isn't delicious
|
|
Isn't this wine delicious
|
|
This wine is not delicious
|
|
Is this wine not delicious
|
|
This wine has been delicious
|
|
Has this wine been delicious
|
|
This wine hasn't been delicious
|
|
Hasn't this wine been delicious
|
|
This wine has not been delicious
|
|
Has this wine not been delicious
|
|
This wine was delicious
|
|
Was this wine delicious
|
|
This wine wasn't delicious
|
|
Wasn't this wine delicious
|
|
This wine was not delicious
|
|
Was this wine not delicious
|
|
This wine had been delicious
|
|
Had this wine been delicious
|
|
This wine hadn't been delicious
|
|
Hadn't this wine been delicious
|
|
This wine had not been delicious
|
|
Had this wine not been delicious
|
|
This wine will be delicious
|
|
Will this wine be delicious
|
|
This wine won't be delicious
|
|
Won't this wine be delicious
|
|
This wine will not be delicious
|
|
Will this wine not be delicious
|
|
This wine will have been delicious
|
|
Will this wine have been delicious
|
|
This wine won't have been delicious
|
|
Won't this wine have been delicious
|
|
This wine will not have been delicious
|
|
Will this wine not have been delicious
|
|
This wine would be delicious
|
|
Would this wine be delicious
|
|
This wine wouldn't be delicious
|
|
Wouldn't this wine be delicious
|
|
This wine would not be delicious
|
|
Would this wine not be delicious
|
|
This wine would have been delicious
|
|
Would this wine have been delicious
|
|
This wine wouldn't have been delicious
|
|
Wouldn't this wine have been delicious
|
|
This wine would not have been delicious
|
|
Would this wine not have been delicious
|
|
</PRE>
|
|
<P>
|
|
We also see
|
|
</P>
|
|
<UL>
|
|
<LI>polarity (positive vs. negative)
|
|
<LI>word order (direct vs. inverted)
|
|
<LI>variation between contracted and full negation
|
|
</UL>
|
|
|
|
<P>
|
|
The list is even longer in languages that have more
|
|
tenses and moods, e.g. the Romance languages.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H1>Lesson 5: Refining semantics in abstract syntax</H1>
|
|
<P>
|
|
<a name="chapsix"></a>
|
|
</P>
|
|
<P>
|
|
Goals:
|
|
</P>
|
|
<UL>
|
|
<LI>include semantic conditions in grammars, by using
|
|
<UL>
|
|
<LI><B>dependent types</B>
|
|
<LI><B>higher order abstract syntax</B>
|
|
<LI>proof objects
|
|
<LI>semantic definitions
|
|
<P></P>
|
|
These concepts are inherited from <B>type theory</B> (more precisely:
|
|
constructive type theory, or Martin-Löf type theory).
|
|
<P></P>
|
|
Type theory is the basis <B>logical frameworks</B>.
|
|
<P></P>
|
|
GF = logical framework + concrete syntax.
|
|
</UL>
|
|
</UL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Dependent types</H2>
|
|
<P>
|
|
<a name="secsmarthouse"></a>
|
|
</P>
|
|
<P>
|
|
Problem: to express <B>conditions of semantic well-formedness</B>.
|
|
</P>
|
|
<P>
|
|
Example: a voice command system for a "smart house" wants to
|
|
eliminate meaningless commands.
|
|
</P>
|
|
<P>
|
|
Thus we want to restrict particular actions to
|
|
particular devices - we can <I>dim a light</I>, but we cannot
|
|
<I>dim a fan</I>.
|
|
</P>
|
|
<P>
|
|
The following example is borrowed from the
|
|
Regulus Book (Rayner & al. 2006).
|
|
</P>
|
|
<P>
|
|
A simple example is a "smart house" system, which
|
|
defines voice commands for household appliances.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>A dependent type system</H3>
|
|
<P>
|
|
Ontology:
|
|
</P>
|
|
<UL>
|
|
<LI>there are commands and device kinds
|
|
<LI>for each kind of device, there are devices and actions
|
|
<LI>a command concerns an action of some kind on a device of the same kind
|
|
</UL>
|
|
|
|
<P>
|
|
Abstract syntax formalizing this:
|
|
</P>
|
|
<PRE>
|
|
cat
|
|
Command ;
|
|
Kind ;
|
|
Device Kind ; -- argument type Kind
|
|
Action Kind ;
|
|
fun
|
|
CAction : (k : Kind) -> Action k -> Device k -> Command ;
|
|
</PRE>
|
|
<P>
|
|
<CODE>Device</CODE> and <CODE>Action</CODE> are both dependent types.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Examples of devices and actions</H3>
|
|
<P>
|
|
Assume the kinds <CODE>light</CODE> and <CODE>fan</CODE>,
|
|
</P>
|
|
<PRE>
|
|
light, fan : Kind ;
|
|
dim : Action light ;
|
|
</PRE>
|
|
<P>
|
|
Given a kind, <I>k</I>, you can form the device <I>the k</I>.
|
|
</P>
|
|
<PRE>
|
|
DKindOne : (k : Kind) -> Device k ; -- the light
|
|
</PRE>
|
|
<P>
|
|
Now we can form the syntax tree
|
|
</P>
|
|
<PRE>
|
|
CAction light dim (DKindOne light)
|
|
</PRE>
|
|
<P>
|
|
but we cannot form the trees
|
|
</P>
|
|
<PRE>
|
|
CAction light dim (DKindOne fan)
|
|
CAction fan dim (DKindOne light)
|
|
CAction fan dim (DKindOne fan)
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Linearization and parsing with dependent types</H3>
|
|
<P>
|
|
Concrete syntax does not know if a category is a dependent type.
|
|
</P>
|
|
<PRE>
|
|
lincat Action = {s : Str} ;
|
|
lin CAction _ act dev = {s = act.s ++ dev.s} ;
|
|
</PRE>
|
|
<P>
|
|
Notice that the <CODE>Kind</CODE> argument is suppressed in linearization.
|
|
</P>
|
|
<P>
|
|
Parsing with dependent types is performed in two phases:
|
|
</P>
|
|
<OL>
|
|
<LI>context-free parsing
|
|
<LI>filtering through type checker
|
|
</OL>
|
|
|
|
<P>
|
|
By just doing the first phase, the <CODE>kind</CODE> argument is not found:
|
|
</P>
|
|
<PRE>
|
|
> parse "dim the light"
|
|
CAction ? dim (DKindOne light)
|
|
</PRE>
|
|
<P>
|
|
Moreover, type-incorrect commands are not rejected:
|
|
</P>
|
|
<PRE>
|
|
> parse "dim the fan"
|
|
CAction ? dim (DKindOne fan)
|
|
</PRE>
|
|
<P>
|
|
The term <CODE>?</CODE> is a <B>metavariable</B>, returned by the parser
|
|
for any subtree that is suppressed by a linearization rule.
|
|
These are the same kind of metavariables as were used <a href="#secediting">here</a>
|
|
to mark incomplete parts of trees in the syntax editor.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Solving metavariables</H3>
|
|
<P>
|
|
Use the command <CODE>put_tree = pt</CODE> with the option <CODE>-typecheck</CODE>:
|
|
</P>
|
|
<PRE>
|
|
> parse "dim the light" | put_tree -typecheck
|
|
CAction light dim (DKindOne light)
|
|
</PRE>
|
|
<P>
|
|
The <CODE>typecheck</CODE> process may fail, in which case an error message
|
|
is shown and no tree is returned:
|
|
</P>
|
|
<PRE>
|
|
> parse "dim the fan" | put_tree -typecheck
|
|
|
|
Error in tree UCommand (CAction ? 0 dim (DKindOne fan)) :
|
|
(? 0 <> fan) (? 0 <> light)
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Polymorphism</H2>
|
|
<P>
|
|
<a name="secpolymorphic"></a>
|
|
</P>
|
|
<P>
|
|
Sometimes an action can be performed on all kinds of devices.
|
|
</P>
|
|
<P>
|
|
This is represented as a function that takes a <CODE>Kind</CODE> as an argument
|
|
and produce an <CODE>Action</CODE> for that <CODE>Kind</CODE>:
|
|
</P>
|
|
<PRE>
|
|
fun switchOn, switchOff : (k : Kind) -> Action k ;
|
|
</PRE>
|
|
<P>
|
|
Functions of this kind are called <B>polymorphic</B>.
|
|
</P>
|
|
<P>
|
|
We can use this kind of polymorphism in concrete syntax as well,
|
|
to express Haskell-type library functions:
|
|
</P>
|
|
<PRE>
|
|
oper const :(a,b : Type) -> a -> b -> a =
|
|
\_,_,c,_ -> c ;
|
|
|
|
oper flip : (a,b,c : Type) -> (a -> b ->c) -> b -> a -> c =
|
|
\_,_,_,f,x,y -> f y x ;
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Dependent types: exercises</H3>
|
|
<P>
|
|
1. Write an abstract syntax module with above contents
|
|
and an appropriate English concrete syntax. Try to parse the commands
|
|
<I>dim the light</I> and <I>dim the fan</I>, with and without <CODE>solve</CODE> filtering.
|
|
</P>
|
|
<P>
|
|
2. Perform random and exhaustive generation, with and without
|
|
<CODE>solve</CODE> filtering.
|
|
</P>
|
|
<P>
|
|
3. Add some device kinds and actions to the grammar.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Proof objects</H2>
|
|
<P>
|
|
<B>Curry-Howard isomorphism</B> = <B>propositions as types principle</B>:
|
|
a proposition is a type of proofs (= proof objects).
|
|
</P>
|
|
<P>
|
|
Example: define the <I>less than</I> proposition for natural numbers,
|
|
</P>
|
|
<PRE>
|
|
cat Nat ;
|
|
fun Zero : Nat ;
|
|
fun Succ : Nat -> Nat ;
|
|
</PRE>
|
|
<P>
|
|
Define inductively what it means for a number <I>x</I> to be <I>less than</I>
|
|
a number <I>y</I>:
|
|
</P>
|
|
<UL>
|
|
<LI><CODE>Zero</CODE> is less than <CODE>Succ</CODE> <I>y</I> for any <I>y</I>.
|
|
<LI>If <I>x</I> is less than <I>y</I>, then <CODE>Succ</CODE> <I>x</I> is less than <CODE>Succ</CODE> <I>y</I>.
|
|
</UL>
|
|
|
|
<P>
|
|
Expressing these axioms in type theory
|
|
with a dependent type <CODE>Less</CODE> <I>x y</I> and two functions constructing
|
|
its objects:
|
|
</P>
|
|
<PRE>
|
|
cat Less Nat Nat ;
|
|
fun lessZ : (y : Nat) -> Less Zero (Succ y) ;
|
|
fun lessS : (x,y : Nat) -> Less x y -> Less (Succ x) (Succ y) ;
|
|
</PRE>
|
|
<P>
|
|
Example: the fact that 2 is less that 4 has the proof object
|
|
</P>
|
|
<PRE>
|
|
lessS (Succ Zero) (Succ (Succ (Succ Zero)))
|
|
(lessS Zero (Succ (Succ Zero)) (lessZ (Succ Zero)))
|
|
: Less (Succ (Succ Zero)) (Succ (Succ (Succ (Succ Zero))))
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Proof-carrying documents</H3>
|
|
<P>
|
|
Idea: to be semantically well-formed, the abstract syntax of a document
|
|
must contain a proof of some property,
|
|
although the proof is not shown in the concrete document.
|
|
</P>
|
|
<P>
|
|
Example: documents describing flight connections:
|
|
</P>
|
|
<P>
|
|
<I>To fly from Gothenburg to Prague, first take LH3043 to Frankfurt, then OK0537 to Prague.</I>
|
|
</P>
|
|
<P>
|
|
The well-formedness of this text is partly expressible by dependent typing:
|
|
</P>
|
|
<PRE>
|
|
cat
|
|
City ;
|
|
Flight City City ;
|
|
fun
|
|
Gothenburg, Frankfurt, Prague : City ;
|
|
LH3043 : Flight Gothenburg Frankfurt ;
|
|
OK0537 : Flight Frankfurt Prague ;
|
|
</PRE>
|
|
<P>
|
|
To extend the conditions to flight connections, we introduce a category
|
|
of proofs that a change is possible:
|
|
</P>
|
|
<PRE>
|
|
cat IsPossible (x,y,z : City)(Flight x y)(Flight y z) ;
|
|
</PRE>
|
|
<P>
|
|
A legal connection is formed by the function
|
|
</P>
|
|
<PRE>
|
|
fun Connect : (x,y,z : City) ->
|
|
(u : Flight x y) -> (v : Flight y z) ->
|
|
IsPossible x y z u v -> Flight x z ;
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Restricted polymorphism</H2>
|
|
<P>
|
|
Above, all Actions were either of
|
|
</P>
|
|
<UL>
|
|
<LI><B>monomorphic</B>: defined for one Kind
|
|
<LI><B>polymorphic</B>: defined for all Kinds
|
|
</UL>
|
|
|
|
<P>
|
|
To make this scale up for new Kinds, we can refine this to
|
|
<B>restricted polymorphism</B>: defined for Kinds of a certain <B>class</B>
|
|
</P>
|
|
<P>
|
|
The notion of class uses the Curry-Howard isomorphism as follows:
|
|
</P>
|
|
<UL>
|
|
<LI>a class is a <B>predicate</B> of Kinds --- i.e. a type depending of Kinds
|
|
<LI>a Kind is in a class if there is a proof object of this type
|
|
</UL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Example: classes for switching and dimming</H3>
|
|
<P>
|
|
We modify the smart house grammar:
|
|
</P>
|
|
<PRE>
|
|
cat
|
|
Switchable Kind ;
|
|
Dimmable Kind ;
|
|
fun
|
|
switchable_light : Switchable light ;
|
|
switchable_fan : Switchable fan ;
|
|
dimmable_light : Dimmable light ;
|
|
|
|
switchOn : (k : Kind) -> Switchable k -> Action k ;
|
|
dim : (k : Kind) -> Dimmable k -> Action k ;
|
|
</PRE>
|
|
<P>
|
|
Classes for new actions can be added incrementally.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Variable bindings</H2>
|
|
<P>
|
|
<a name="secbinding"></a>
|
|
</P>
|
|
<P>
|
|
Mathematical notation and programming languages have
|
|
expressions that <B>bind</B> variables.
|
|
</P>
|
|
<P>
|
|
Example: universal quantifier formula
|
|
</P>
|
|
<PRE>
|
|
(All x)B(x)
|
|
</PRE>
|
|
<P>
|
|
The variable <CODE>x</CODE> has a <B>binding</B> <CODE>(All x)</CODE>, and
|
|
occurs <B>bound</B> in the <B>body</B> <CODE>B(x)</CODE>.
|
|
</P>
|
|
<P>
|
|
Examples from informal mathematical language:
|
|
</P>
|
|
<PRE>
|
|
for all x, x is equal to x
|
|
|
|
the function that for any numbers x and y returns the maximum of x+y
|
|
and x*y
|
|
|
|
Let x be a natural number. Assume that x is even. Then x + 3 is odd.
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Higher-order abstract syntax</H3>
|
|
<P>
|
|
Abstract syntax can use functions as arguments:
|
|
</P>
|
|
<PRE>
|
|
cat Ind ; Prop ;
|
|
fun All : (Ind -> Prop) -> Prop
|
|
</PRE>
|
|
<P>
|
|
where <CODE>Ind</CODE> is the type of individuals and <CODE>Prop</CODE>,
|
|
the type of propositions.
|
|
</P>
|
|
<P>
|
|
Let us add an equality predicate
|
|
</P>
|
|
<PRE>
|
|
fun Eq : Ind -> Ind -> Prop
|
|
</PRE>
|
|
<P>
|
|
Now we can form the tree
|
|
</P>
|
|
<PRE>
|
|
All (\x -> Eq x x)
|
|
</PRE>
|
|
<P>
|
|
which we want to relate to the ordinary notation
|
|
</P>
|
|
<PRE>
|
|
(All x)(x = x)
|
|
</PRE>
|
|
<P>
|
|
In <B>higher-order abstract syntax</B> (HOAS), all variable bindings are
|
|
expressed using higher-order syntactic constructors.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Higher-order abstract syntax: linearization</H3>
|
|
<P>
|
|
HOAS has proved to be useful in the semantics and computer implementation of
|
|
variable-binding expressions.
|
|
</P>
|
|
<P>
|
|
How do we relate HOAS to the concrete syntax?
|
|
</P>
|
|
<P>
|
|
In GF, we write
|
|
</P>
|
|
<PRE>
|
|
fun All : (Ind -> Prop) -> Prop
|
|
lin All B = {s = "(" ++ "All" ++ B.$0 ++ ")" ++ B.s}
|
|
</PRE>
|
|
<P>
|
|
General rule: if an argument type of a <CODE>fun</CODE> function is
|
|
a function type <CODE>A -> C</CODE>, the linearization type of
|
|
this argument is the linearization type of <CODE>C</CODE>
|
|
together with a new field <CODE>$0 : Str</CODE>.
|
|
</P>
|
|
<P>
|
|
The argument <CODE>B</CODE> thus has the linearization type
|
|
</P>
|
|
<PRE>
|
|
{s : Str ; $0 : Str},
|
|
</PRE>
|
|
<P>
|
|
If there are more bindings, we add <CODE>$1</CODE>, <CODE>$2</CODE>, etc.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Eta expansion</H3>
|
|
<P>
|
|
To make sense of linearization, syntax trees must be
|
|
<B>eta-expanded</B>: for any function of type
|
|
</P>
|
|
<PRE>
|
|
A -> B
|
|
</PRE>
|
|
<P>
|
|
an eta-expanded syntax tree has the form
|
|
</P>
|
|
<PRE>
|
|
\x -> b
|
|
</PRE>
|
|
<P>
|
|
where <CODE>b : B</CODE> under the assumption <CODE>x : A</CODE>.
|
|
</P>
|
|
<P>
|
|
Given the linearization rule
|
|
</P>
|
|
<PRE>
|
|
lin Eq a b = {s = "(" ++ a.s ++ "=" ++ b.s ++ ")"}
|
|
</PRE>
|
|
<P>
|
|
the linearization of the tree
|
|
</P>
|
|
<PRE>
|
|
\x -> Eq x x
|
|
</PRE>
|
|
<P>
|
|
is the record
|
|
</P>
|
|
<PRE>
|
|
{$0 = "x", s = ["( x = x )"]}
|
|
</PRE>
|
|
<P>
|
|
Then we can compute the linearization of the formula,
|
|
</P>
|
|
<PRE>
|
|
All (\x -> Eq x x) --> {s = "[( All x ) ( x = x )]"}.
|
|
</PRE>
|
|
<P>
|
|
The linearization of the variable <CODE>x</CODE> is,
|
|
"automagically", the string <CODE>"x"</CODE>.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Parsing variable bindings</H3>
|
|
<P>
|
|
GF can treat any one-word string as a variable symbol.
|
|
</P>
|
|
<PRE>
|
|
> p -cat=Prop "( All x ) ( x = x )"
|
|
All (\x -> Eq x x)
|
|
</PRE>
|
|
<P>
|
|
Variables must be bound if they are used:
|
|
</P>
|
|
<PRE>
|
|
> p -cat=Prop "( All x ) ( x = y )"
|
|
no tree found
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Exercises on variable bindings</H3>
|
|
<P>
|
|
1. Write an abstract syntax of the whole
|
|
<B>predicate calculus</B>, with the
|
|
<B>connectives</B> "and", "or", "implies", and "not", and the
|
|
<B>quantifiers</B> "exists" and "for all". Use higher-order functions
|
|
to guarantee that unbounded variables do not occur.
|
|
</P>
|
|
<P>
|
|
2. Write a concrete syntax for your favourite
|
|
notation of predicate calculus. Use Latex as target language
|
|
if you want nice output. You can also try producing boolean
|
|
expressions of some programming language. Use as many parenthesis as you need to
|
|
guarantee non-ambiguity.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Semantic definitions</H2>
|
|
<P>
|
|
<a name="secdefdef"></a>
|
|
</P>
|
|
<P>
|
|
The <CODE>fun</CODE> judgements of GF are declarations of functions, giving their types.
|
|
</P>
|
|
<P>
|
|
Can we <B>compute</B> <CODE>fun</CODE> functions?
|
|
</P>
|
|
<P>
|
|
Mostly we are not interested, since functions are seen as constructors,
|
|
i.e. data forms - as usual with
|
|
</P>
|
|
<PRE>
|
|
fun Zero : Nat ;
|
|
fun Succ : Nat -> Nat ;
|
|
</PRE>
|
|
<P>
|
|
But it is also possible to give <B>semantic definitions</B> to functions.
|
|
The key word is <CODE>def</CODE>:
|
|
</P>
|
|
<PRE>
|
|
fun one : Nat ;
|
|
def one = Succ Zero ;
|
|
|
|
fun twice : Nat -> Nat ;
|
|
def twice x = plus x x ;
|
|
|
|
fun plus : Nat -> Nat -> Nat ;
|
|
def
|
|
plus x Zero = x ;
|
|
plus x (Succ y) = Succ (Sum x y) ;
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Computing a tree</H3>
|
|
<P>
|
|
Computation: follow a chain of definition until no definition
|
|
can be applied,
|
|
</P>
|
|
<PRE>
|
|
plus one one -->
|
|
plus (Succ Zero) (Succ Zero) -->
|
|
Succ (plus (Succ Zero) Zero) -->
|
|
Succ (Succ Zero)
|
|
</PRE>
|
|
<P>
|
|
Computation in GF is performed with the <CODE>put_term</CODE> command and the
|
|
<CODE>compute</CODE> transformation, e.g.
|
|
</P>
|
|
<PRE>
|
|
> parse -tr "1 + 1" | put_term -transform=compute -tr | l
|
|
plus one one
|
|
Succ (Succ Zero)
|
|
s(s(0))
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Definitional equality</H3>
|
|
<P>
|
|
Two trees are definitionally equal if they compute into the same tree.
|
|
</P>
|
|
<P>
|
|
Definitional equality does not guarantee sameness of linearization:
|
|
</P>
|
|
<PRE>
|
|
plus one one ===> 1 + 1
|
|
Succ (Succ Zero) ===> s(s(0))
|
|
</PRE>
|
|
<P>
|
|
The main use of this concept is in type checking: sameness of types.
|
|
</P>
|
|
<P>
|
|
Thus e.g. the following types are equal
|
|
</P>
|
|
<PRE>
|
|
Less Zero one
|
|
Less Zero (Succ Zero))
|
|
</PRE>
|
|
<P>
|
|
so that an object of one also is an object of the other.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Judgement forms for constructors</H3>
|
|
<P>
|
|
The judgement form <CODE>data</CODE> tells that a category has
|
|
certain functions as constructors:
|
|
</P>
|
|
<PRE>
|
|
data Nat = Succ | Zero ;
|
|
</PRE>
|
|
<P>
|
|
The type signatures of constructors are given separately,
|
|
</P>
|
|
<PRE>
|
|
fun Zero : Nat ;
|
|
fun Succ : Nat -> Nat ;
|
|
</PRE>
|
|
<P>
|
|
There is also a shorthand:
|
|
</P>
|
|
<PRE>
|
|
data Succ : Nat -> Nat ; === fun Succ : Nat -> Nat ;
|
|
data Nat = Succ ;
|
|
</PRE>
|
|
<P>
|
|
Notice: in <CODE>def</CODE> definitions, identifier patterns not
|
|
marked as <CODE>data</CODE> will be treated as variables.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Exercises on semantic definitions</H3>
|
|
<P>
|
|
1. Implement an interpreter of a small functional programming
|
|
language with natural numbers, lists, pairs, lambdas, etc. Use higher-order
|
|
abstract syntax with semantic definitions. As concrete syntax, use
|
|
your favourite programming language.
|
|
</P>
|
|
<P>
|
|
2. There is no termination checking for <CODE>def</CODE> definitions.
|
|
Construct an examples that makes type checking loop.
|
|
Type checking can be invoked with <CODE>put_term -transform=solve</CODE>.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Lesson 6: Grammars of formal languages</H2>
|
|
<P>
|
|
<a name="chapseven"></a>
|
|
</P>
|
|
<P>
|
|
Goals:
|
|
</P>
|
|
<UL>
|
|
<LI>write grammars for formal languages (mathematical notation, programming languages)
|
|
<LI>interface between formal and natural langauges
|
|
<LI>implement a compiler by using GF
|
|
</UL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Arithmetic expressions</H3>
|
|
<P>
|
|
We construct a calculator with addition, subtraction, multiplication, and
|
|
division of integers.
|
|
</P>
|
|
<PRE>
|
|
abstract Calculator = {
|
|
|
|
cat Exp ;
|
|
|
|
fun
|
|
EPlus, EMinus, ETimes, EDiv : Exp -> Exp -> Exp ;
|
|
EInt : Int -> Exp ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
The category <CODE>Int</CODE> is a built-in category of
|
|
integers. Its syntax trees <B>integer literals</B>, i.e.
|
|
sequences of digits:
|
|
</P>
|
|
<PRE>
|
|
5457455814608954681 : Int
|
|
</PRE>
|
|
<P>
|
|
These are the only objects of type <CODE>Int</CODE>:
|
|
grammars are not allowed to declare functions with <CODE>Int</CODE> as value type.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Concrete syntax: a simple approach</H3>
|
|
<P>
|
|
We begin with a
|
|
concrete syntax that always uses parentheses around binary
|
|
operator applications:
|
|
</P>
|
|
<PRE>
|
|
concrete CalculatorP of Calculator = {
|
|
|
|
lincat
|
|
Exp = SS ;
|
|
lin
|
|
EPlus = infix "+" ;
|
|
EMinus = infix "-" ;
|
|
ETimes = infix "*" ;
|
|
EDiv = infix "/" ;
|
|
EInt i = i ;
|
|
|
|
oper
|
|
infix : Str -> SS -> SS -> SS = \f,x,y ->
|
|
ss ("(" ++ x.s ++ f ++ y.s ++ ")") ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
Now we have
|
|
</P>
|
|
<PRE>
|
|
> linearize EPlus (EInt 2) (ETimes (EInt 3) (EInt 4))
|
|
( 2 + ( 3 * 4 ) )
|
|
</PRE>
|
|
<P>
|
|
First problems:
|
|
</P>
|
|
<UL>
|
|
<LI>to get rid of superfluous spaces and
|
|
<LI>to recognize integer literals in the parser
|
|
</UL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Lexing and unlexing</H2>
|
|
<P>
|
|
<a name="seclexing"></a>
|
|
</P>
|
|
<P>
|
|
The input of parsing in GF is not just a string, but a list of
|
|
<B>tokens</B>, returned by a <B>lexer</B>.
|
|
</P>
|
|
<P>
|
|
The default lexer in GF returns chunks separated by spaces:
|
|
</P>
|
|
<PRE>
|
|
"(12 + (3 * 4))" ===> "(12", "+", "(3". "*". "4))"
|
|
</PRE>
|
|
<P>
|
|
The proper way would be
|
|
</P>
|
|
<PRE>
|
|
"(", "12", "+", "(", "3", "*", "4", ")", ")"
|
|
</PRE>
|
|
<P>
|
|
Moreover, the tokens <CODE>"12"</CODE>, <CODE>"3"</CODE>, and <CODE>"4"</CODE> should be recognized as
|
|
integer literals - they cannot be found in the grammar.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
Lexers are invoked by flags to the command <CODE>put_string = ps</CODE>.
|
|
</P>
|
|
<PRE>
|
|
> put_string -lexcode "(2 + (3 * 4))"
|
|
( 2 + ( 3 * 4 ) )
|
|
</PRE>
|
|
<P>
|
|
This can be piped into a parser, as usual:
|
|
</P>
|
|
<PRE>
|
|
> ps -lexcode "(2 + (3 * 4))" | parse
|
|
EPlus (EInt 2) (ETimes (EInt 3) (EInt 4))
|
|
</PRE>
|
|
<P>
|
|
In linearization, we use a corresponding <B>unlexer</B>:
|
|
</P>
|
|
<PRE>
|
|
> linearize EPlus (EInt 2) (ETimes (EInt 3) (EInt 4)) | ps -unlexcode
|
|
(2 + (3 * 4))
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Most common lexers and unlexers</H3>
|
|
<TABLE ALIGN="center" CELLPADDING="4" BORDER="1">
|
|
<TR>
|
|
<TH>lexer</TH>
|
|
<TH>unlexer</TH>
|
|
<TH COLSPAN="2">description</TH>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>chars</CODE></TD>
|
|
<TD><CODE>unchars</CODE></TD>
|
|
<TD>each character is a token</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>lexcode</CODE></TD>
|
|
<TD><CODE>unlexcode</CODE></TD>
|
|
<TD>program code conventions (uses Haskell's lex)</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>lexmixed</CODE></TD>
|
|
<TD><CODE>unlexmixed</CODE></TD>
|
|
<TD>like text, but between $ signs like code</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>lextext</CODE></TD>
|
|
<TD><CODE>unlextext</CODE></TD>
|
|
<TD>with conventions on punctuation and capitals</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>words</CODE></TD>
|
|
<TD><CODE>unwords</CODE></TD>
|
|
<TD>(default) tokens separated by space characters</TD>
|
|
</TR>
|
|
</TABLE>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Precedence and fixity</H2>
|
|
<P>
|
|
Arithmetic expressions should be unambiguous. If we write
|
|
</P>
|
|
<PRE>
|
|
2 + 3 * 4
|
|
</PRE>
|
|
<P>
|
|
it should be parsed as one, but not both, of
|
|
</P>
|
|
<PRE>
|
|
EPlus (EInt 2) (ETimes (EInt 3) (EInt 4))
|
|
ETimes (EPlus (EInt 2) (EInt 3)) (EInt 4)
|
|
</PRE>
|
|
<P>
|
|
We choose the former tree, because
|
|
multiplication has <B>higher precedence</B> than addition.
|
|
</P>
|
|
<P>
|
|
To express the latter tree, we have to use parentheses:
|
|
</P>
|
|
<PRE>
|
|
(2 + 3) * 4
|
|
</PRE>
|
|
<P>
|
|
The usual precedence rules:
|
|
</P>
|
|
<UL>
|
|
<LI>Integer constants and expressions in parentheses have the highest precedence.
|
|
<LI>Multiplication and division have equal precedence, lower than the highest
|
|
but higher than addition and subtraction, which are again equal.
|
|
<LI>All the four binary operations are <B>left-associative</B>:
|
|
<CODE>1 + 2 + 3</CODE> means the same as <CODE>(1 + 2) + 3</CODE>.
|
|
</UL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Precedence as a parameter</H3>
|
|
<P>
|
|
Precedence can be made into an inherent feature of expressions:
|
|
</P>
|
|
<PRE>
|
|
oper
|
|
Prec : PType = Ints 2 ;
|
|
TermPrec : Type = {s : Str ; p : Prec} ;
|
|
|
|
mkPrec : Prec -> Str -> TermPrec = \p,s -> {s = s ; p = p} ;
|
|
|
|
lincat
|
|
Exp = TermPrec ;
|
|
</PRE>
|
|
<P>
|
|
Notice <CODE>Ints 2</CODE>: a parameter type, whose values are the integers
|
|
<CODE>0,1,2</CODE>.
|
|
</P>
|
|
<P>
|
|
Using precedence levels: compare the inherent precedence of an
|
|
expression with the expected precedence.
|
|
</P>
|
|
<UL>
|
|
<LI>if the inherent precedence is lower than the expected precedence,
|
|
use parentheses
|
|
<LI>otherwise, no parentheses are needed
|
|
</UL>
|
|
|
|
<P>
|
|
This idea is encoded in the operation
|
|
</P>
|
|
<PRE>
|
|
oper usePrec : TermPrec -> Prec -> Str = \x,p ->
|
|
case lessPrec x.p p of {
|
|
True => "(" x.s ")" ;
|
|
False => x.s
|
|
} ;
|
|
</PRE>
|
|
<P>
|
|
(We use <CODE>lessPrec</CODE> from <CODE>lib/prelude/Formal</CODE>.)
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Fixities</H3>
|
|
<P>
|
|
We can define left-associative infix expressions:
|
|
</P>
|
|
<PRE>
|
|
infixl : Prec -> Str -> (_,_ : TermPrec) -> TermPrec = \p,f,x,y ->
|
|
mkPrec p (usePrec x p ++ f ++ usePrec y (nextPrec p)) ;
|
|
</PRE>
|
|
<P>
|
|
Constant-like expressions (the highest level):
|
|
</P>
|
|
<PRE>
|
|
constant : Str -> TermPrec = mkPrec 2 ;
|
|
</PRE>
|
|
<P>
|
|
All these operations can be found in <CODE>lib/prelude/Formal</CODE>,
|
|
which has 5 levels.
|
|
</P>
|
|
<P>
|
|
Now we can write the whole concrete syntax of <CODE>Calculator</CODE> compactly:
|
|
</P>
|
|
<PRE>
|
|
concrete CalculatorC of Calculator = open Formal, Prelude in {
|
|
|
|
flags lexer = codelit ; unlexer = code ; startcat = Exp ;
|
|
|
|
lincat Exp = TermPrec ;
|
|
|
|
lin
|
|
EPlus = infixl 0 "+" ;
|
|
EMinus = infixl 0 "-" ;
|
|
ETimes = infixl 1 "*" ;
|
|
EDiv = infixl 1 "/" ;
|
|
EInt i = constant i.s ;
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Exercises on precedence</H3>
|
|
<P>
|
|
1. Define non-associative and right-associative infix operations
|
|
analogous to <CODE>infixl</CODE>.
|
|
</P>
|
|
<P>
|
|
2. Add a constructor that puts parentheses around expressions
|
|
to raise their precedence, but that is eliminated by a <CODE>def</CODE> definition.
|
|
Test parsing with and without a pipe to <CODE>pt -transform=compute</CODE>.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Code generation as linearization</H2>
|
|
<P>
|
|
Translate arithmetic (infix) to JVM (postfix):
|
|
</P>
|
|
<PRE>
|
|
2 + 3 * 4
|
|
|
|
===>
|
|
|
|
iconst 2 : iconst 3 ; iconst 4 ; imul ; iadd
|
|
</PRE>
|
|
<P>
|
|
Just give linearization rules for JVM:
|
|
</P>
|
|
<PRE>
|
|
lin
|
|
EPlus = postfix "iadd" ;
|
|
EMinus = postfix "isub" ;
|
|
ETimes = postfix "imul" ;
|
|
EDiv = postfix "idiv" ;
|
|
EInt i = ss ("iconst" ++ i.s) ;
|
|
oper
|
|
postfix : Str -> SS -> SS -> SS = \op,x,y ->
|
|
ss (x.s ++ ";" ++ y.s ++ ";" ++ op) ;
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Programs with variables</H3>
|
|
<P>
|
|
A <B>straight code</B> programming language, with
|
|
<B>initializations</B> and <B>assignments</B>:
|
|
</P>
|
|
<PRE>
|
|
int x = 2 + 3 ;
|
|
int y = x + 1 ;
|
|
x = x + 9 * y ;
|
|
</PRE>
|
|
<P>
|
|
We define programs by the following constructors:
|
|
</P>
|
|
<PRE>
|
|
fun
|
|
PEmpty : Prog ;
|
|
PInit : Exp -> (Var -> Prog) -> Prog ;
|
|
PAss : Var -> Exp -> Prog -> Prog ;
|
|
</PRE>
|
|
<P>
|
|
<CODE>PInit</CODE> uses higher-order abstract syntax for making the
|
|
initialized variable available in the <B>continuation</B> of the program.
|
|
</P>
|
|
<P>
|
|
The abstract syntax tree for the above code is
|
|
</P>
|
|
<PRE>
|
|
PInit (EPlus (EInt 2) (EInt 3)) (\x ->
|
|
PInit (EPlus (EVar x) (EInt 1)) (\y ->
|
|
PAss x (EPlus (EVar x) (ETimes (EInt 9) (EVar y)))
|
|
PEmpty))
|
|
</PRE>
|
|
<P>
|
|
No uninitialized variables are allowed - there are no constructors for <CODE>Var</CODE>!
|
|
But we do have the rule
|
|
</P>
|
|
<PRE>
|
|
fun EVar : Var -> Exp ;
|
|
</PRE>
|
|
<P>
|
|
The rest of the grammar is just the same as for arithmetic expressions
|
|
<a href="#secprecedence">here</a>. The best way to implement it is perhaps by writing a
|
|
module that extends the expression module. The most natural start category
|
|
of the extension is <CODE>Prog</CODE>.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Exercises on code generation</H3>
|
|
<P>
|
|
1. Define a C-like concrete syntax of the straight-code language.
|
|
</P>
|
|
<P>
|
|
2. Extend the straight-code language to expressions of type <CODE>float</CODE>.
|
|
To guarantee type safety, you can define a category <CODE>Typ</CODE> of types, and
|
|
make <CODE>Exp</CODE> and <CODE>Var</CODE> dependent on <CODE>Typ</CODE>. Basic floating point expressions
|
|
can be formed from literal of the built-in GF type <CODE>Float</CODE>. The arithmetic
|
|
operations should be made polymorphic (as <a href="#secpolymorphic">here</a>).
|
|
</P>
|
|
<P>
|
|
3. Extend JVM generation to the straight-code language, using
|
|
two more instructions
|
|
</P>
|
|
<UL>
|
|
<LI><CODE>iload</CODE> <I>x</I>, which loads the value of the variable <I>x</I>
|
|
<LI><CODE>istore</CODE> <I>x</I> which stores a value to the variable <I>x</I>
|
|
</UL>
|
|
|
|
<P>
|
|
Thus the code for the example in the previous section is
|
|
</P>
|
|
<PRE>
|
|
iconst 2 ; iconst 3 ; iadd ; istore x ;
|
|
iload x ; iconst 1 ; iadd ; istore y ;
|
|
iload x ; iconst 9 ; iload y ; imul ; iadd ; istore x ;
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
4. If you made the exercise of adding floating point numbers to
|
|
the language, you can now cash out the main advantage of type checking
|
|
for code generation: selecting type-correct JVM instructions. The floating
|
|
point instructions are precisely the same as the integer one, except that
|
|
the prefix is <CODE>f</CODE> instead of <CODE>i</CODE>, and that <CODE>fconst</CODE> takes floating
|
|
point literals as arguments.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H1>Lesson 7: Embedded grammars</H1>
|
|
<P>
|
|
<a name="chapeight"></a>
|
|
</P>
|
|
<P>
|
|
Goals:
|
|
</P>
|
|
<UL>
|
|
<LI>use grammars as parts of programs written in Haskell and JavaScript
|
|
<LI>implement stand-alone question-answering systems and translators based on
|
|
GF grammars
|
|
<LI>generate language models for speech recognition from GF grammars
|
|
</UL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Functionalities of an embedded grammar format</H2>
|
|
<P>
|
|
GF grammars can be used as parts of programs written in other programming
|
|
languages, to be called <B>host languages</B>.
|
|
This facility is based on several components:
|
|
</P>
|
|
<UL>
|
|
<LI>PGF: a portable format for multilingual GF grammars
|
|
<LI>a PGF interpreter written in the host language
|
|
<LI>a library in the host language that enables calling the interpreter
|
|
<LI>a way to manipulate abstract syntax trees in the host language
|
|
</UL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>The portable grammar format</H2>
|
|
<P>
|
|
The portable format is called PGF, "Portable Grammar Format".
|
|
</P>
|
|
<P>
|
|
This format is produced by the GF batch compiler <CODE>gf</CODE>,
|
|
executable from the operative system shell:
|
|
</P>
|
|
<PRE>
|
|
% gf --make SOURCE.gf
|
|
</PRE>
|
|
<P>
|
|
PGF is the recommended format in
|
|
which final grammar products are distributed, because they
|
|
are stripped from superfluous information and can be started and applied
|
|
faster than sets of separate modules.
|
|
</P>
|
|
<P>
|
|
Application programmers have never any need to read or modify PGF files.
|
|
</P>
|
|
<P>
|
|
PGF thus plays the same role as machine code in
|
|
general-purpose programming (or bytecode in Java).
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Haskell: the EmbedAPI module</H3>
|
|
<P>
|
|
The Haskell API contains (among other things) the following types and functions:
|
|
</P>
|
|
<PRE>
|
|
readPGF :: FilePath -> IO PGF
|
|
|
|
linearize :: PGF -> Language -> Tree -> String
|
|
parse :: PGF -> Language -> Category -> String -> [Tree]
|
|
|
|
linearizeAll :: PGF -> Tree -> [String]
|
|
linearizeAllLang :: PGF -> Tree -> [(Language,String)]
|
|
|
|
parseAll :: PGF -> Category -> String -> [[Tree]]
|
|
parseAllLang :: PGF -> Category -> String -> [(Language,[Tree])]
|
|
|
|
languages :: PGF -> [Language]
|
|
categories :: PGF -> [Category]
|
|
startCat :: PGF -> Category
|
|
</PRE>
|
|
<P>
|
|
This is the only module that needs to be imported in the Haskell application.
|
|
It is available as a part of the GF distribution, in the file
|
|
<CODE>src/PGF.hs</CODE>.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>First application: a translator</H3>
|
|
<P>
|
|
Let us first build a stand-alone translator, which can translate
|
|
in any multilingual grammar between any languages in the grammar.
|
|
</P>
|
|
<PRE>
|
|
module Main where
|
|
|
|
import PGF
|
|
import System (getArgs)
|
|
|
|
main :: IO ()
|
|
main = do
|
|
file:_ <- getArgs
|
|
gr <- readPGF file
|
|
interact (translate gr)
|
|
|
|
translate :: PGF -> String -> String
|
|
translate gr s = case parseAllLang gr (startCat gr) s of
|
|
(lg,t:_):_ -> unlines [linearize gr l t | l <- languages gr, l /= lg]
|
|
_ -> "NO PARSE"
|
|
</PRE>
|
|
<P>
|
|
To run the translator, first compile it by
|
|
</P>
|
|
<PRE>
|
|
% ghc --make -o trans Translator.hs
|
|
</PRE>
|
|
<P>
|
|
For this, you need the Haskell compiler <A HREF="http://www.haskell.org/ghc">GHC</A>.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Producing PGF for the translator</H3>
|
|
<P>
|
|
Then produce a PGF file. For instance, the <CODE>Food</CODE> grammar set can be
|
|
compiled as follows:
|
|
</P>
|
|
<PRE>
|
|
% gf --make FoodEng.gf FoodIta.gf
|
|
</PRE>
|
|
<P>
|
|
This produces the file <CODE>Food.pgf</CODE> (its name comes from the abstract syntax).
|
|
</P>
|
|
<P>
|
|
The Haskell library function <CODE>interact</CODE> makes the <CODE>trans</CODE> program work
|
|
like a Unix filter, which reads from standard input and writes to standard
|
|
output. Therefore it can be a part of a pipe and read and write files.
|
|
The simplest way to translate is to <CODE>echo</CODE> input to the program:
|
|
</P>
|
|
<PRE>
|
|
% echo "this wine is delicious" | ./trans Food.pgf
|
|
questo vino è delizioso
|
|
</PRE>
|
|
<P>
|
|
The result is given in all languages except the input language.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>A translator loop</H3>
|
|
<P>
|
|
To avoid starting the translator over and over again:
|
|
change <CODE>interact</CODE> in the main function to <CODE>loop</CODE>, defined as
|
|
follows:
|
|
</P>
|
|
<PRE>
|
|
loop :: (String -> String) -> IO ()
|
|
loop trans = do
|
|
s <- getLine
|
|
if s == "quit" then putStrLn "bye" else do
|
|
putStrLn $ trans s
|
|
loop trans
|
|
</PRE>
|
|
<P>
|
|
The loop keeps on translating line by line until the input line
|
|
is <CODE>quit</CODE>.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>A question-answer system</H3>
|
|
<P>
|
|
<a name="secmathprogram"></a>
|
|
</P>
|
|
<P>
|
|
The next application is also a translator, but it adds a
|
|
<B>transfer</B> component - a function that transforms syntax trees.
|
|
</P>
|
|
<P>
|
|
The transfer function we use is one that computes a question into an answer.
|
|
</P>
|
|
<P>
|
|
The program accepts simple questions about arithmetic and answers
|
|
"yes" or "no" in the language in which the question was made:
|
|
</P>
|
|
<PRE>
|
|
Is 123 prime?
|
|
No.
|
|
77 est impair ?
|
|
Oui.
|
|
</PRE>
|
|
<P>
|
|
We change the pure translator by giving
|
|
the <CODE>translate</CODE> function the transfer as an extra argument:
|
|
</P>
|
|
<PRE>
|
|
translate :: (Tree -> Tree) -> PGF -> String -> String
|
|
</PRE>
|
|
<P>
|
|
Ordinary translation as a special case where
|
|
transfer is the identity function (<CODE>id</CODE> in Haskell).
|
|
</P>
|
|
<P>
|
|
To reply in the <I>same</I> language as the question:
|
|
</P>
|
|
<PRE>
|
|
translate tr gr = case parseAllLang gr (startCat gr) s of
|
|
(lg,t:_):_ -> linearize gr lg (tr t)
|
|
_ -> "NO PARSE"
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Abstract syntax of the query system</H3>
|
|
<P>
|
|
Input: abstract syntax judgements
|
|
</P>
|
|
<PRE>
|
|
abstract Query = {
|
|
|
|
flags startcat=Question ;
|
|
|
|
cat
|
|
Answer ; Question ; Object ;
|
|
|
|
fun
|
|
Even : Object -> Question ;
|
|
Odd : Object -> Question ;
|
|
Prime : Object -> Question ;
|
|
Number : Int -> Object ;
|
|
|
|
Yes : Answer ;
|
|
No : Answer ;
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Exporting GF datatypes to Haskell</H3>
|
|
<P>
|
|
To make it easy to define a transfer function, we export the
|
|
abstract syntax to a system of Haskell datatypes:
|
|
</P>
|
|
<PRE>
|
|
% gf --output-format=haskell Query.pgf
|
|
</PRE>
|
|
<P>
|
|
It is also possible to produce the Haskell file together with PGF, by
|
|
</P>
|
|
<PRE>
|
|
% gf --make --output-format=haskell QueryEng.gf
|
|
</PRE>
|
|
<P>
|
|
The result is a file named <CODE>Query.hs</CODE>, containing a
|
|
module named <CODE>Query</CODE>.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<P>
|
|
Output: Haskell definitions
|
|
</P>
|
|
<PRE>
|
|
module Query where
|
|
import PGF
|
|
|
|
data GAnswer =
|
|
GYes
|
|
| GNo
|
|
|
|
data GObject = GNumber GInt
|
|
|
|
data GQuestion =
|
|
GPrime GObject
|
|
| GOdd GObject
|
|
| GEven GObject
|
|
|
|
newtype GInt = GInt Integer
|
|
</PRE>
|
|
<P>
|
|
All type and constructor names are prefixed with a <CODE>G</CODE> to prevent clashes.
|
|
</P>
|
|
<P>
|
|
The Haskell module name is the same as the abstract syntax name.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>The question-answer function</H3>
|
|
<P>
|
|
Haskell's type checker guarantees that the functions are well-typed also with
|
|
respect to GF.
|
|
</P>
|
|
<PRE>
|
|
answer :: GQuestion -> GAnswer
|
|
answer p = case p of
|
|
GOdd x -> test odd x
|
|
GEven x -> test even x
|
|
GPrime x -> test prime x
|
|
|
|
value :: GObject -> Int
|
|
value e = case e of
|
|
GNumber (GInt i) -> fromInteger i
|
|
|
|
test :: (Int -> Bool) -> GObject -> GAnswer
|
|
test f x = if f (value x) then GYes else GNo
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Converting between Haskell and GF trees</H3>
|
|
<P>
|
|
The generated Haskell module also contains
|
|
</P>
|
|
<PRE>
|
|
class Gf a where
|
|
gf :: a -> Tree
|
|
fg :: Tree -> a
|
|
|
|
instance Gf GQuestion where
|
|
gf (GEven x1) = DTr [] (AC (CId "Even")) [gf x1]
|
|
gf (GOdd x1) = DTr [] (AC (CId "Odd")) [gf x1]
|
|
gf (GPrime x1) = DTr [] (AC (CId "Prime")) [gf x1]
|
|
fg t =
|
|
case t of
|
|
DTr [] (AC (CId "Even")) [x1] -> GEven (fg x1)
|
|
DTr [] (AC (CId "Odd")) [x1] -> GOdd (fg x1)
|
|
DTr [] (AC (CId "Prime")) [x1] -> GPrime (fg x1)
|
|
_ -> error ("no Question " ++ show t)
|
|
</PRE>
|
|
<P>
|
|
For the programmer, it is enougo to know:
|
|
</P>
|
|
<UL>
|
|
<LI>all GF names are in Haskell prefixed with <CODE>G</CODE>
|
|
<LI><CODE>gf</CODE> translates from Haskell objects to GF trees
|
|
<LI><CODE>fg</CODE> translates from GF trees to Haskell objects
|
|
</UL>
|
|
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Putting it all together: the transfer definition</H3>
|
|
<PRE>
|
|
module TransferDef where
|
|
|
|
import PGF (Tree)
|
|
import Query -- generated from GF
|
|
|
|
transfer :: Tree -> Tree
|
|
transfer = gf . answer . fg
|
|
|
|
answer :: GQuestion -> GAnswer
|
|
answer p = case p of
|
|
GOdd x -> test odd x
|
|
GEven x -> test even x
|
|
GPrime x -> test prime x
|
|
|
|
value :: GObject -> Int
|
|
value e = case e of
|
|
GNumber (GInt i) -> fromInteger i
|
|
|
|
test :: (Int -> Bool) -> GObject -> GAnswer
|
|
test f x = if f (value x) then GYes else GNo
|
|
|
|
prime :: Int -> Bool
|
|
prime x = elem x primes where
|
|
primes = sieve [2 .. x]
|
|
sieve (p:xs) = p : sieve [ n | n <- xs, n `mod` p > 0 ]
|
|
sieve [] = []
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Putting it all together: the Main module</H3>
|
|
<P>
|
|
Here is the complete code in the Haskell file <CODE>TransferLoop.hs</CODE>.
|
|
</P>
|
|
<PRE>
|
|
module Main where
|
|
|
|
import PGF
|
|
import TransferDef (transfer)
|
|
|
|
main :: IO ()
|
|
main = do
|
|
gr <- readPGF "Query.pgf"
|
|
loop (translate transfer gr)
|
|
|
|
loop :: (String -> String) -> IO ()
|
|
loop trans = do
|
|
s <- getLine
|
|
if s == "quit" then putStrLn "bye" else do
|
|
putStrLn $ trans s
|
|
loop trans
|
|
|
|
translate :: (Tree -> Tree) -> PGF -> String -> String
|
|
translate tr gr s = case parseAllLang gr (startCat gr) s of
|
|
(lg,t:_):_ -> linearize gr lg (tr t)
|
|
_ -> "NO PARSE"
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Putting it all together: the Makefile</H3>
|
|
<P>
|
|
To automate the production of the system, we write a <CODE>Makefile</CODE> as follows:
|
|
</P>
|
|
<PRE>
|
|
all:
|
|
gf --make --output-format=haskell QueryEng
|
|
ghc --make -o ./math TransferLoop.hs
|
|
strip math
|
|
</PRE>
|
|
<P>
|
|
(The empty segments starting the command lines in a Makefile must be tabs.)
|
|
Now we can compile the whole system by just typing
|
|
</P>
|
|
<PRE>
|
|
make
|
|
</PRE>
|
|
<P>
|
|
Then you can run it by typing
|
|
</P>
|
|
<PRE>
|
|
./math
|
|
</PRE>
|
|
<P>
|
|
Just to summarize, the source of the application consists of the following files:
|
|
</P>
|
|
<PRE>
|
|
Makefile -- a makefile
|
|
Math.gf -- abstract syntax
|
|
Math???.gf -- concrete syntaxes
|
|
TransferDef.hs -- definition of question-to-answer function
|
|
TransferLoop.hs -- Haskell Main module
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Web server applications</H2>
|
|
<P>
|
|
PGF files can be used in web servers, for which there is a Haskell library included
|
|
in <CODE>src/server/</CODE>. How to build a server for tasks like translators is explained
|
|
in the <A HREF="../src/server/README"><CODE>README</CODE></A> file in that directory.
|
|
</P>
|
|
<P>
|
|
One of the servers that can be readily built with the library (without any
|
|
programming required) is <B>fridge poetry magnets</B>. It is an application that
|
|
uses an incremental parser to suggest grammatically correct next words. Here
|
|
is an example of its application to the <CODE>Foods</CODE> grammars.
|
|
</P>
|
|
<P>
|
|
<IMG ALIGN="middle" SRC="food-magnet.png" BORDER="0" ALT="">
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>JavaScript applications</H2>
|
|
<P>
|
|
JavaScript is a programming language that has interpreters built in in most
|
|
web browsers. It is therefore usable for client side web programs, which can even
|
|
be run without access to the internet. The following figure shows a JavaScript
|
|
program compiled from GF grammars as run on an iPhone.
|
|
</P>
|
|
<P>
|
|
<IMG ALIGN="middle" SRC="iphone.jpg" BORDER="0" ALT="">
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Compiling to JavaScript</H3>
|
|
<P>
|
|
JavaScript is one of the output formats of the GF batch compiler. Thus the following
|
|
command generates a JavaScript file from two <CODE>Food</CODE> grammars.
|
|
</P>
|
|
<PRE>
|
|
% gf --make --output-format=js FoodEng.gf FoodIta.gf
|
|
</PRE>
|
|
<P>
|
|
The name of the generated file is <CODE>Food.js</CODE>, derived from the top-most abstract
|
|
syntax name. This file contains the multilingual grammar as a JavaScript object.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>Using the JavaScript grammar</H3>
|
|
<P>
|
|
To perform parsing and linearization, the run-time library
|
|
<CODE>gflib.js</CODE> is used. It is included in <CODE>GF/lib/javascript/</CODE>, together with
|
|
some other JavaScript and HTML files; these files can be used
|
|
as templates for building applications.
|
|
</P>
|
|
<P>
|
|
An example of usage is
|
|
<A HREF="http://grammaticalframework.org:41296"><CODE>translator.html</CODE></A>,
|
|
which is in fact initialized with
|
|
a pointer to the Food grammar, so that it provides translation between the English
|
|
and Italian grammars:
|
|
</P>
|
|
<P>
|
|
<IMG ALIGN="middle" SRC="food-js.png" BORDER="0" ALT="">
|
|
</P>
|
|
<P>
|
|
The grammar must have the name <CODE>grammar.js</CODE>. The abstract syntax and start
|
|
category names in <CODE>translator.html</CODE> must match the ones in the grammar.
|
|
With these changes, the translator works for any multilingual grammar.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H2>Language models for speech recognition</H2>
|
|
<P>
|
|
The standard way of using GF in speech recognition is by building
|
|
<B>grammar-based language models</B>.
|
|
</P>
|
|
<P>
|
|
GF supports several formats, including
|
|
GSL, the formatused in the <A HREF="http://www.nuance.com">Nuance speech recognizer</A>.
|
|
</P>
|
|
<P>
|
|
GSL is produced from GF by running <CODE>gf</CODE> with the flag
|
|
<CODE>--output-format=gsl</CODE>.
|
|
</P>
|
|
<P>
|
|
Example: GSL generated from <CODE>FoodsEng.gf</CODE>.
|
|
</P>
|
|
<PRE>
|
|
% gf --make --output-format=gsl FoodsEng.gf
|
|
% more FoodsEng.gsl
|
|
|
|
;GSL2.0
|
|
; Nuance speech recognition grammar for FoodsEng
|
|
; Generated by GF
|
|
|
|
.MAIN Phrase_cat
|
|
|
|
Item_1 [("that" Kind_1) ("this" Kind_1)]
|
|
Item_2 [("these" Kind_2) ("those" Kind_2)]
|
|
Item_cat [Item_1 Item_2]
|
|
Kind_1 ["cheese" "fish" "pizza" (Quality_1 Kind_1)
|
|
"wine"]
|
|
Kind_2 ["cheeses" "fish" "pizzas"
|
|
(Quality_1 Kind_2) "wines"]
|
|
Kind_cat [Kind_1 Kind_2]
|
|
Phrase_1 [(Item_1 "is" Quality_1)
|
|
(Item_2 "are" Quality_1)]
|
|
Phrase_cat Phrase_1
|
|
|
|
Quality_1 ["boring" "delicious" "expensive"
|
|
"fresh" "italian" ("very" Quality_1) "warm"]
|
|
Quality_cat Quality_1
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<H3>More speech recognition grammar formats</H3>
|
|
<P>
|
|
Other formats available via the <CODE>--output-format</CODE> flag include:
|
|
</P>
|
|
<TABLE ALIGN="center" CELLPADDING="4" BORDER="1">
|
|
<TR>
|
|
<TH>Format</TH>
|
|
<TH COLSPAN="2">Description</TH>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>gsl</CODE></TD>
|
|
<TD>Nuance GSL speech recognition grammar</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>jsgf</CODE></TD>
|
|
<TD>Java Speech Grammar Format (JSGF)</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>jsgf_sisr_old</CODE></TD>
|
|
<TD>JSGF with semantic tags in SISR WD 20030401 format</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>srgs_abnf</CODE></TD>
|
|
<TD>SRGS ABNF format</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>srgs_xml</CODE></TD>
|
|
<TD>SRGS XML format</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>srgs_xml_prob</CODE></TD>
|
|
<TD>SRGS XML format, with weights</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>slf</CODE></TD>
|
|
<TD>finite automaton in the HTK SLF format</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>slf_sub</CODE></TD>
|
|
<TD>finite automaton with sub-automata in HTK SLF</TD>
|
|
</TR>
|
|
</TABLE>
|
|
|
|
<P>
|
|
All currently available formats can be seen with <CODE>gf --help</CODE>.
|
|
</P>
|
|
|
|
<!-- html code generated by txt2tags 2.4 (http://txt2tags.sf.net) -->
|
|
<!-- cmdline: txt2tags gf-tutorial.txt -->
|
|
</BODY></HTML>
|