forked from GitHub/gf-core
2034 lines
63 KiB
HTML
2034 lines
63 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||
<HTML>
|
||
<HEAD>
|
||
<META NAME="generator" CONTENT="http://txt2tags.sf.net">
|
||
<TITLE>Grammatical Framework Tutorial</TITLE>
|
||
</HEAD><BODY BGCOLOR="white" TEXT="black">
|
||
<P ALIGN="center"><CENTER><H1>Grammatical Framework Tutorial</H1>
|
||
<FONT SIZE="4">
|
||
<I>Author: Aarne Ranta <aarne (at) cs.chalmers.se></I><BR>
|
||
Last update: Sun Dec 18 22:29:50 2005
|
||
</FONT></CENTER>
|
||
|
||
<P></P>
|
||
<HR NOSHADE SIZE=1>
|
||
<P></P>
|
||
<UL>
|
||
<LI><A HREF="#toc1">GF = Grammatical Framework</A>
|
||
<UL>
|
||
<LI><A HREF="#toc2">Getting the GF program</A>
|
||
</UL>
|
||
<LI><A HREF="#toc3">The ``.cf`` grammar format</A>
|
||
<UL>
|
||
<LI><A HREF="#toc4">Importing grammars and parsing strings</A>
|
||
<LI><A HREF="#toc5">Generating trees and strings</A>
|
||
<LI><A HREF="#toc6">Visualizing trees</A>
|
||
<LI><A HREF="#toc7">Some random-generated sentences</A>
|
||
<LI><A HREF="#toc8">Systematic generation</A>
|
||
<LI><A HREF="#toc9">More on pipes; tracing</A>
|
||
<LI><A HREF="#toc10">Writing and reading files</A>
|
||
<LI><A HREF="#toc11">Labelled context-free grammars</A>
|
||
<LI><A HREF="#toc12">The labelled context-free format</A>
|
||
</UL>
|
||
<LI><A HREF="#toc13">The ``.gf`` grammar format</A>
|
||
<UL>
|
||
<LI><A HREF="#toc14">Abstract and concrete syntax</A>
|
||
<LI><A HREF="#toc15">Judgement forms</A>
|
||
<LI><A HREF="#toc16">Module types</A>
|
||
<LI><A HREF="#toc17">Record types, records, and ``Str``s</A>
|
||
<LI><A HREF="#toc18">An abstract syntax example</A>
|
||
<LI><A HREF="#toc19">A concrete syntax example</A>
|
||
<LI><A HREF="#toc20">Modules and files</A>
|
||
</UL>
|
||
<LI><A HREF="#toc21">Multilingual grammars and translation</A>
|
||
<UL>
|
||
<LI><A HREF="#toc22">An Italian concrete syntax</A>
|
||
<LI><A HREF="#toc23">Using a multilingual grammar</A>
|
||
<LI><A HREF="#toc24">Translation session</A>
|
||
<LI><A HREF="#toc25">Translation quiz</A>
|
||
</UL>
|
||
<LI><A HREF="#toc26">Grammar architecture</A>
|
||
<UL>
|
||
<LI><A HREF="#toc27">Extending a grammar</A>
|
||
<LI><A HREF="#toc28">Multiple inheritance</A>
|
||
<LI><A HREF="#toc29">Visualizing module structure</A>
|
||
</UL>
|
||
<LI><A HREF="#toc30">System commands</A>
|
||
<LI><A HREF="#toc31">Resource modules</A>
|
||
<UL>
|
||
<LI><A HREF="#toc32">The golden rule of functional programming</A>
|
||
<LI><A HREF="#toc33">Operation definitions</A>
|
||
<LI><A HREF="#toc34">The ``resource`` module type</A>
|
||
<LI><A HREF="#toc35">Opening a ``resource``</A>
|
||
<LI><A HREF="#toc36">Division of labour</A>
|
||
</UL>
|
||
<LI><A HREF="#toc37">Morphology</A>
|
||
<UL>
|
||
<LI><A HREF="#toc38">Parameters and tables</A>
|
||
<LI><A HREF="#toc39">Inflection tables, paradigms, and ``oper`` definitions</A>
|
||
<LI><A HREF="#toc40">Worst-case macros and data abstraction</A>
|
||
<LI><A HREF="#toc41">A system of paradigms using ``Prelude`` operations</A>
|
||
<LI><A HREF="#toc42">An intelligent noun paradigm using ``case`` expressions</A>
|
||
<LI><A HREF="#toc43">Pattern matching</A>
|
||
<LI><A HREF="#toc44">Morphological ``resource`` modules</A>
|
||
<LI><A HREF="#toc45">Testing ``resource`` modules</A>
|
||
</UL>
|
||
<LI><A HREF="#toc46">Using morphology in concrete syntax</A>
|
||
<UL>
|
||
<LI><A HREF="#toc47">Parametric vs. inherent features, agreement</A>
|
||
<LI><A HREF="#toc48">English concrete syntax with parameters</A>
|
||
<LI><A HREF="#toc49">Hierarchic parameter types</A>
|
||
<LI><A HREF="#toc50">Morphological analysis and morphology quiz</A>
|
||
<LI><A HREF="#toc51">Discontinuous constituents</A>
|
||
</UL>
|
||
<LI><A HREF="#toc52">More constructs for concrete syntax</A>
|
||
<UL>
|
||
<LI><A HREF="#toc53">Free variation</A>
|
||
<LI><A HREF="#toc54">Record extension and subtyping</A>
|
||
<LI><A HREF="#toc55">Tuples and product types</A>
|
||
<LI><A HREF="#toc56">Prefix-dependent choices</A>
|
||
<LI><A HREF="#toc57">Predefined types and operations</A>
|
||
</UL>
|
||
<LI><A HREF="#toc58">More features of the module system</A>
|
||
<UL>
|
||
<LI><A HREF="#toc59">Resource grammars and their reuse</A>
|
||
<LI><A HREF="#toc60">Interfaces, instances, and functors</A>
|
||
<LI><A HREF="#toc61">Restricted inheritance and qualified opening</A>
|
||
</UL>
|
||
<LI><A HREF="#toc62">More concepts of abstract syntax</A>
|
||
<UL>
|
||
<LI><A HREF="#toc63">Dependent types</A>
|
||
<LI><A HREF="#toc64">Higher-order abstract syntax</A>
|
||
<LI><A HREF="#toc65">Semantic definitions</A>
|
||
</UL>
|
||
<LI><A HREF="#toc66">Transfer modules</A>
|
||
<LI><A HREF="#toc67">Practical issues</A>
|
||
<UL>
|
||
<LI><A HREF="#toc68">Lexers and unlexers</A>
|
||
<LI><A HREF="#toc69">Efficiency of grammars</A>
|
||
<LI><A HREF="#toc70">Speech input and output</A>
|
||
<LI><A HREF="#toc71">Multilingual syntax editor</A>
|
||
<LI><A HREF="#toc72">Interactive Development Environment (IDE)</A>
|
||
<LI><A HREF="#toc73">Communicating with GF</A>
|
||
<LI><A HREF="#toc74">Embedded grammars in Haskell, Java, and Prolog</A>
|
||
<LI><A HREF="#toc75">Alternative input and output grammar formats</A>
|
||
</UL>
|
||
<LI><A HREF="#toc76">Case studies</A>
|
||
<UL>
|
||
<LI><A HREF="#toc77">Interfacing formal and natural languages</A>
|
||
</UL>
|
||
</UL>
|
||
|
||
<P></P>
|
||
<HR NOSHADE SIZE=1>
|
||
<P></P>
|
||
<P>
|
||
<IMG ALIGN="middle" SRC="../gf-logo.gif" BORDER="0" ALT="">
|
||
</P>
|
||
<A NAME="toc1"></A>
|
||
<H2>GF = Grammatical Framework</H2>
|
||
<P>
|
||
The term GF is used for different things:
|
||
</P>
|
||
<UL>
|
||
<LI>a <B>program</B> used for working with grammars
|
||
<LI>a <B>programming language</B> in which grammars can be written
|
||
<LI>a <B>theory</B> about grammars and languages
|
||
</UL>
|
||
|
||
<P>
|
||
This tutorial is primarily about the GF program and
|
||
the GF programming language.
|
||
It will guide you
|
||
</P>
|
||
<UL>
|
||
<LI>to use the GF program
|
||
<LI>to write GF grammars
|
||
<LI>to write programs in which GF grammars are used as components
|
||
</UL>
|
||
|
||
<A NAME="toc2"></A>
|
||
<H3>Getting the GF program</H3>
|
||
<P>
|
||
The program is open-source free software, which you can download via the
|
||
GF Homepage:
|
||
<A HREF="http://www.cs.chalmers.se/~aarne/GF"><CODE>http://www.cs.chalmers.se/~aarne/GF</CODE></A>
|
||
</P>
|
||
<P>
|
||
There you can download
|
||
</P>
|
||
<UL>
|
||
<LI>binaries for Linux, Solaris, Macintosh, and Windows
|
||
<LI>source code and documentation
|
||
<LI>grammar libraries and examples
|
||
</UL>
|
||
|
||
<P>
|
||
If you want to compile GF from source, you need Haskell and Java
|
||
compilers. But normally you don't have to compile, and you definitely
|
||
don't need to know Haskell or Java to use GF.
|
||
</P>
|
||
<P>
|
||
To start the GF program, assuming you have installed it, just type
|
||
</P>
|
||
<PRE>
|
||
% gf
|
||
</PRE>
|
||
<P>
|
||
in the shell. You will see GF's welcome message and the prompt <CODE>></CODE>.
|
||
The command
|
||
</P>
|
||
<PRE>
|
||
> help
|
||
</PRE>
|
||
<P>
|
||
will give you a list of available commands.
|
||
</P>
|
||
<P>
|
||
As a common convention in this Tutorial, we will use
|
||
</P>
|
||
<UL>
|
||
<LI><CODE>%</CODE> as a prompt that marks system commands
|
||
<LI><CODE>></CODE> as a prompt that marks GF commands
|
||
</UL>
|
||
|
||
<P>
|
||
Thus you should not type these prompts, but only the lines that
|
||
follow them.
|
||
</P>
|
||
<A NAME="toc3"></A>
|
||
<H2>The ``.cf`` grammar format</H2>
|
||
<P>
|
||
Now you are ready to try out your first grammar.
|
||
We start with one that is not written in GF language, but
|
||
in the ubiquitous BNF notation (Backus Naur Form), which GF can also
|
||
understand. Type (or copy) the following lines in a file named
|
||
<CODE>food.cf</CODE>:
|
||
</P>
|
||
<PRE>
|
||
S ::= Item "is" Quality ;
|
||
Item ::= "this" Kind | "that" Kind ;
|
||
Kind ::= Quality Kind ;
|
||
Kind ::= "wine" | "cheese" | "fish" ;
|
||
Quality ::= "very" Quality ;
|
||
Quality ::= "fresh" | "warm" | "Italian" | "expensive" | "delicious" | "boring" ;
|
||
</PRE>
|
||
<P>
|
||
This grammar defines a set of phrases usable to speak about food.
|
||
It builds <B>sentences</B> (<CODE>S</CODE>) by assigning <CODE>Qualities</CODE> to
|
||
<CODE>Item</CODE>s. The grammar shows a typical character of GF grammars:
|
||
they are small grammars describing some more or less well-defined
|
||
domain, such as in this case food.
|
||
</P>
|
||
<A NAME="toc4"></A>
|
||
<H3>Importing grammars and parsing strings</H3>
|
||
<P>
|
||
The first GF command when using a grammar is to <B>import</B> it.
|
||
The command has a long name, <CODE>import</CODE>, and a short name, <CODE>i</CODE>.
|
||
You can type either
|
||
</P>
|
||
<P>
|
||
```> import food.cf
|
||
</P>
|
||
<P>
|
||
or
|
||
</P>
|
||
<P>
|
||
```> i food.cf
|
||
</P>
|
||
<P>
|
||
to get the same effect.
|
||
The effect is that the GF program <B>compiles</B> your grammar into an internal
|
||
representation, and shows a new prompt when it is ready.
|
||
</P>
|
||
<P>
|
||
You can now use GF for <B>parsing</B>:
|
||
</P>
|
||
<PRE>
|
||
> parse "this cheese is delicious"
|
||
S_Item_is_Quality (Item_this_Kind Kind_cheese) Quality_delicious
|
||
|
||
> p "that wine is very very Italian"
|
||
S_Item_is_Quality (Item_that_Kind Kind_wine)
|
||
(Quality_very_Quality (Quality_very_Quality Quality_Italian))
|
||
</PRE>
|
||
<P>
|
||
The <CODE>parse</CODE> (= <CODE>p</CODE>) command takes a <B>string</B>
|
||
(in double quotes) and returns an <B>abstract syntax tree</B> - the thing
|
||
beginning with <CODE>S_Item_Is_Quality</CODE>. We will see soon how to make sense
|
||
of the abstract syntax trees - now you should just notice that the tree
|
||
is different for the two strings.
|
||
</P>
|
||
<P>
|
||
Strings that return a tree when parsed do so in virtue of the grammar
|
||
you imported. Try parsing something else, and you fail
|
||
</P>
|
||
<PRE>
|
||
> p "hello world"
|
||
No success in cf parsing hello world
|
||
no tree found
|
||
</PRE>
|
||
<P></P>
|
||
<A NAME="toc5"></A>
|
||
<H3>Generating trees and strings</H3>
|
||
<P>
|
||
You can also use GF for <B>linearizing</B>
|
||
(<CODE>linearize = l</CODE>). This is the inverse of
|
||
parsing, taking trees into strings:
|
||
</P>
|
||
<PRE>
|
||
> linearize S_Item_is_Quality (Item_that_Kind Kind_wine) Quality_warm
|
||
that wine is warm
|
||
</PRE>
|
||
<P>
|
||
What is the use of this? Typically not that you type in a tree at
|
||
the GF prompt. The utility of linearization comes from the fact that
|
||
you can obtain a tree from somewhere else. One way to do so is
|
||
<B>random generation</B> (<CODE>generate_random = gr</CODE>):
|
||
</P>
|
||
<PRE>
|
||
> generate_random
|
||
S_Item_is_Quality (Item_this_Kind Kind_wine) Quality_delicious
|
||
</PRE>
|
||
<P>
|
||
Now you can copy the tree and paste it to the <CODE>linearize command</CODE>.
|
||
Or, more efficiently, feed random generation into linearization by using
|
||
a <B>pipe</B>.
|
||
</P>
|
||
<PRE>
|
||
> gr | l
|
||
this fresh cheese is delicious
|
||
</PRE>
|
||
<P></P>
|
||
<A NAME="toc6"></A>
|
||
<H3>Visualizing trees</H3>
|
||
<P>
|
||
The gibberish code with parentheses returned by the parser does not
|
||
look like trees. Why is it called so? Trees are a data structure that
|
||
represent <B>nesting</B>: trees are branching entities, and the branches
|
||
are themselves trees. Parentheses give a linear representation of trees,
|
||
useful for the computer. But the human eye may prefer to see a visualization;
|
||
for this purpose, GF provides the command <CODE>visualizre_tree = vt</CODE>, to which
|
||
parsing (and any other tree-producing command) can be piped:
|
||
</P>
|
||
<PRE>
|
||
parse "this delicious cheese is very Italian" | vt
|
||
</PRE>
|
||
<P></P>
|
||
<P>
|
||
<IMG ALIGN="middle" SRC="Tree.png" BORDER="0" ALT="">
|
||
</P>
|
||
<A NAME="toc7"></A>
|
||
<H3>Some random-generated sentences</H3>
|
||
<P>
|
||
Random generation can be quite amusing. So you may want to
|
||
generate ten strings with one and the same command:
|
||
</P>
|
||
<PRE>
|
||
> gr -number=10 | l
|
||
that wine is boring
|
||
that fresh cheese is fresh
|
||
that cheese is very boring
|
||
this cheese is Italian
|
||
that expensive cheese is expensive
|
||
that fish is fresh
|
||
that wine is very Italian
|
||
this wine is Italian
|
||
this cheese is boring
|
||
this fish is boring
|
||
</PRE>
|
||
<P></P>
|
||
<A NAME="toc8"></A>
|
||
<H3>Systematic generation</H3>
|
||
<P>
|
||
To generate <I>all</I> sentence that a grammar
|
||
can generate, use the command <CODE>generate_trees = gt</CODE>.
|
||
</P>
|
||
<PRE>
|
||
> generate_trees | l
|
||
that cheese is very Italian
|
||
that cheese is very boring
|
||
that cheese is very delicious
|
||
that cheese is very expensive
|
||
that cheese is very fresh
|
||
...
|
||
this wine is expensive
|
||
this wine is fresh
|
||
this wine is warm
|
||
|
||
</PRE>
|
||
<P>
|
||
You get quite a few trees but not all of them: only up to a given
|
||
<B>depth</B> of trees. To see how you can get more, use the
|
||
<CODE>help = h</CODE> command,
|
||
</P>
|
||
<PRE>
|
||
help gt
|
||
</PRE>
|
||
<P>
|
||
<B>Quiz</B>. If the command <CODE>gt</CODE> generated all
|
||
trees in your grammar, it would never terminate. Why?
|
||
</P>
|
||
<A NAME="toc9"></A>
|
||
<H3>More on pipes; tracing</H3>
|
||
<P>
|
||
A pipe of GF commands can have any length, but the "output type"
|
||
(either string or tree) of one command must always match the "input type"
|
||
of the next command.
|
||
</P>
|
||
<P>
|
||
The intermediate results in a pipe can be observed by putting the
|
||
<B>tracing</B> flag <CODE>-tr</CODE> to each command whose output you
|
||
want to see:
|
||
</P>
|
||
<PRE>
|
||
> gr -tr | l -tr | p
|
||
|
||
S_Item_is_Quality (Item_this_Kind Kind_cheese) Quality_boring
|
||
this cheese is boring
|
||
S_Item_is_Quality (Item_this_Kind Kind_cheese) Quality_boring
|
||
</PRE>
|
||
<P>
|
||
This facility is good for test purposes: for instance, you
|
||
may want to see if a grammar is <B>ambiguous</B>, i.e.
|
||
contains strings that can be parsed in more than one way.
|
||
</P>
|
||
<A NAME="toc10"></A>
|
||
<H3>Writing and reading files</H3>
|
||
<P>
|
||
To save the outputs of GF commands into a file, you can
|
||
pipe it to the <CODE>write_file = wf</CODE> command,
|
||
</P>
|
||
<PRE>
|
||
> gr -number=10 | l | write_file exx.tmp
|
||
</PRE>
|
||
<P>
|
||
You can read the file back to GF with the
|
||
<CODE>read_file = rf</CODE> command,
|
||
</P>
|
||
<PRE>
|
||
> read_file exx.tmp | p -lines
|
||
</PRE>
|
||
<P>
|
||
Notice the flag <CODE>-lines</CODE> given to the parsing
|
||
command. This flag tells GF to parse each line of
|
||
the file separately. Without the flag, the grammar could
|
||
not recognize the string in the file, because it is not
|
||
a sentence but a sequence of ten sentences.
|
||
</P>
|
||
<A NAME="toc11"></A>
|
||
<H3>Labelled context-free grammars</H3>
|
||
<P>
|
||
The syntax trees returned by GF's parser in the previous examples
|
||
are not so nice to look at. The identifiers of form <CODE>Mks</CODE>
|
||
are <B>labels</B> of the BNF rules. To see which label corresponds to
|
||
which rule, you can use the <CODE>print_grammar = pg</CODE> command
|
||
with the <CODE>printer</CODE> flag set to <CODE>cf</CODE> (which means context-free):
|
||
</P>
|
||
<PRE>
|
||
> print_grammar -printer=cf
|
||
|
||
S_Item_is_Quality. S ::= Item "is" Quality ;
|
||
Quality_Italian. Quality ::= "Italian" ;
|
||
Quality_boring. Quality ::= "boring" ;
|
||
Quality_delicious. Quality ::= "delicious" ;
|
||
Quality_expensive. Quality ::= "expensive" ;
|
||
Quality_fresh. Quality ::= "fresh" ;
|
||
Quality_very_Quality. Quality ::= "very" Quality ;
|
||
Quality_warm. Quality ::= "warm" ;
|
||
Kind_Quality_Kind. Kind ::= Quality Kind ;
|
||
Kind_cheese. Kind ::= "cheese" ;
|
||
Kind_fish. Kind ::= "fish" ;
|
||
Kind_wine. Kind ::= "wine" ;
|
||
Item_that_Kind. Item ::= "that" Kind ;
|
||
Item_this_Kind. Item ::= "this" Kind ;
|
||
</PRE>
|
||
<P>
|
||
A syntax tree such as
|
||
</P>
|
||
<PRE>
|
||
S_Item_is_Quality (Item_this_Kind Kind_wine) Quality_delicious
|
||
</PRE>
|
||
<P>
|
||
encodes the sequence of grammar rules used for building the
|
||
tree. If you look at this tree, you will notice that <CODE>Item_this_Kind</CODE>
|
||
is the label of the rule prefixing <CODE>this</CODE> to a <CODE>Kind</CODE>,
|
||
thereby forming an <CODE>Item</CODE>.
|
||
<CODE>Kind_wine</CODE> is the label of the kind <CODE>"wine"</CODE>,
|
||
and so on. These labels are formed automatically when the grammar
|
||
is compiled by GF, in a way that guarantees that different rules
|
||
get different labels.
|
||
</P>
|
||
<A NAME="toc12"></A>
|
||
<H3>The labelled context-free format</H3>
|
||
<P>
|
||
The <B>labelled context-free grammar</B> format permits user-defined
|
||
labels to each rule.
|
||
In files with the suffix <CODE>.cf</CODE>, you can prefix rules with
|
||
labels that you provide yourself - these may be more useful
|
||
than the automatically generated ones. The following is a possible
|
||
labelling of <CODE>paleolithic.cf</CODE> with nicer-looking labels.
|
||
</P>
|
||
<PRE>
|
||
Is. S ::= Item "is" Quality ;
|
||
That. Item ::= "that" Kind ;
|
||
This. Item ::= "this" Kind ;
|
||
QKind. Kind ::= Quality Kind ;
|
||
Cheese. Kind ::= "cheese" ;
|
||
Fish. Kind ::= "fish" ;
|
||
Wine. Kind ::= "wine" ;
|
||
Italian. Quality ::= "Italian" ;
|
||
Boring. Quality ::= "boring" ;
|
||
Delicious. Quality ::= "delicious" ;
|
||
Expensive. Quality ::= "expensive" ;
|
||
Fresh. Quality ::= "fresh" ;
|
||
Very. Quality ::= "very" Quality ;
|
||
Warm. Quality ::= "warm" ;
|
||
</PRE>
|
||
<P>
|
||
With this grammar, the trees look as follows:
|
||
</P>
|
||
<PRE>
|
||
> parse -tr "this delicious cheese is very Italian" | vt
|
||
Is (This (QKind Delicious Cheese)) (Very Italian)
|
||
</PRE>
|
||
<P></P>
|
||
<P>
|
||
<IMG ALIGN="middle" SRC="Tree2.png" BORDER="0" ALT="">
|
||
</P>
|
||
<A NAME="toc13"></A>
|
||
<H2>The ``.gf`` grammar format</H2>
|
||
<P>
|
||
To see what there is in GF's shell state when a grammar
|
||
has been imported, you can give the plain command
|
||
<CODE>print_grammar = pg</CODE>.
|
||
</P>
|
||
<PRE>
|
||
> print_grammar
|
||
</PRE>
|
||
<P>
|
||
The output is quite unreadable at this stage, and you may feel happy that
|
||
you did not need to write the grammar in that notation, but that the
|
||
GF grammar compiler produced it.
|
||
</P>
|
||
<P>
|
||
However, we will now start the demonstration
|
||
how GF's own notation gives you
|
||
much more expressive power than the <CODE>.cf</CODE>
|
||
format. We will introduce the <CODE>.gf</CODE> format by presenting
|
||
one more way of defining the same grammar as in
|
||
<CODE>food.cf</CODE>.
|
||
Then we will show how the full GF grammar format enables you
|
||
to do things that are not possible in the weaker formats.
|
||
</P>
|
||
<A NAME="toc14"></A>
|
||
<H3>Abstract and concrete syntax</H3>
|
||
<P>
|
||
A GF grammar consists of two main parts:
|
||
</P>
|
||
<UL>
|
||
<LI><B>abstract syntax</B>, defining what syntax trees there are
|
||
<LI><B>concrete syntax</B>, defining how trees are linearized into strings
|
||
</UL>
|
||
|
||
<P>
|
||
The EBNF and CF formats fuse these two things together, but it is possible
|
||
to take them apart. For instance, the sentence formation rule
|
||
</P>
|
||
<PRE>
|
||
Is. S ::= Item "is" Quality ;
|
||
</PRE>
|
||
<P>
|
||
is interpreted as the following pair of rules:
|
||
</P>
|
||
<PRE>
|
||
fun Is : Item -> Quality -> S ;
|
||
lin Is item quality = {s = item.s ++ "is" ++ quality.s} ;
|
||
</PRE>
|
||
<P>
|
||
The former rule, with the keyword <CODE>fun</CODE>, belongs to the abstract syntax.
|
||
It defines the <B>function</B>
|
||
<CODE>Is</CODE> which constructs syntax trees of form
|
||
(<CODE>Is</CODE> <I>item</I> <I>quality</I>).
|
||
</P>
|
||
<P>
|
||
The latter rule, with the keyword <CODE>lin</CODE>, belongs to the concrete syntax.
|
||
It defines the <B>linearization function</B> for
|
||
syntax trees of form (<CODE>Is</CODE> <I>item</I> <I>quality</I>).
|
||
</P>
|
||
<A NAME="toc15"></A>
|
||
<H3>Judgement forms</H3>
|
||
<P>
|
||
Rules in a GF grammar are called <B>judgements</B>, and the keywords
|
||
<CODE>fun</CODE> and <CODE>lin</CODE> are used for distinguishing between two
|
||
<B>judgement forms</B>. Here is a summary of the most important
|
||
judgement forms:
|
||
</P>
|
||
<UL>
|
||
<LI>abstract syntax
|
||
<P></P>
|
||
</UL>
|
||
|
||
<TABLE ALIGN="center" CELLPADDING="4" BORDER="1">
|
||
<TR>
|
||
<TD>form</TD>
|
||
<TD>reading</TD>
|
||
</TR>
|
||
<TR>
|
||
<TD><CODE>cat</CODE> C</TD>
|
||
<TD>C is a category</TD>
|
||
</TR>
|
||
<TR>
|
||
<TD><CODE>fun</CODE> f <CODE>:</CODE> A</TD>
|
||
<TD>f is a function of type A</TD>
|
||
</TR>
|
||
</TABLE>
|
||
|
||
<P></P>
|
||
<UL>
|
||
<LI>concrete syntax
|
||
<P></P>
|
||
</UL>
|
||
|
||
<TABLE ALIGN="center" CELLPADDING="4" BORDER="1">
|
||
<TR>
|
||
<TD>form</TD>
|
||
<TD>reading</TD>
|
||
</TR>
|
||
<TR>
|
||
<TD><CODE>lincat</CODE> C <CODE>=</CODE> T</TD>
|
||
<TD>category C has linearization type T</TD>
|
||
</TR>
|
||
<TR>
|
||
<TD><CODE>lin</CODE> f <CODE>=</CODE> t</TD>
|
||
<TD>function f has linearization t</TD>
|
||
</TR>
|
||
</TABLE>
|
||
|
||
<P></P>
|
||
<P>
|
||
We return to the precise meanings of these judgement forms later.
|
||
First we will look at how judgements are grouped into modules, and
|
||
show how the paleolithic grammar is
|
||
expressed by using modules and judgements.
|
||
</P>
|
||
<A NAME="toc16"></A>
|
||
<H3>Module types</H3>
|
||
<P>
|
||
A GF grammar consists of <B>modules</B>,
|
||
into which judgements are grouped. The most important
|
||
module forms are
|
||
</P>
|
||
<UL>
|
||
<LI><CODE>abstract</CODE> A <CODE>=</CODE> M, abstract syntax A with judgements in
|
||
the module body M.
|
||
<LI><CODE>concrete</CODE> C <CODE>of</CODE> A <CODE>=</CODE> M, concrete syntax C of the
|
||
abstract syntax A, with judgements in the module body M.
|
||
</UL>
|
||
|
||
<A NAME="toc17"></A>
|
||
<H3>Record types, records, and ``Str``s</H3>
|
||
<P>
|
||
The linearization type of a category is a <B>record type</B>, with
|
||
zero of more <B>fields</B> of different types. The simplest record
|
||
type used for linearization in GF is
|
||
</P>
|
||
<PRE>
|
||
{s : Str}
|
||
</PRE>
|
||
<P>
|
||
which has one field, with <B>label</B> <CODE>s</CODE> and type <CODE>Str</CODE>.
|
||
</P>
|
||
<P>
|
||
Examples of records of this type are
|
||
</P>
|
||
<PRE>
|
||
{s = "foo"}
|
||
{s = "hello" ++ "world"}
|
||
</PRE>
|
||
<P></P>
|
||
<P>
|
||
Whenever a record <CODE>r</CODE> of type <CODE>{s : Str}</CODE> is given,
|
||
<CODE>r.s</CODE> is an object of type <CODE>Str</CODE>. This is
|
||
a special case of the <B>projection</B> rule, allowing the extraction
|
||
of fields from a record:
|
||
</P>
|
||
<UL>
|
||
<LI>if <I>r</I> : <CODE>{</CODE> ... <I>p</I> : <I>T</I> ... <CODE>}</CODE> then <I>r.p</I> : <I>T</I>
|
||
</UL>
|
||
|
||
<P>
|
||
The type <CODE>Str</CODE> is really the type of <B>token lists</B>, but
|
||
most of the time one can conveniently think of it as the type of strings,
|
||
denoted by string literals in double quotes.
|
||
</P>
|
||
<P>
|
||
Notice that
|
||
</P>
|
||
<PRE>
|
||
"hello world"
|
||
</PRE>
|
||
<P>
|
||
is not recommended as an expression of type <CODE>Str</CODE>. It denotes
|
||
a token with a space in it, and will usually
|
||
not work with the lexical analysis that precedes parsing. A shorthand
|
||
exemplified by
|
||
</P>
|
||
<PRE>
|
||
["hello world and people"] === "hello" ++ "world" ++ "and" ++ "people"
|
||
</PRE>
|
||
<P>
|
||
can be used for lists of tokens. The expression
|
||
</P>
|
||
<PRE>
|
||
[]
|
||
</PRE>
|
||
<P>
|
||
denotes the empty token list.
|
||
</P>
|
||
<A NAME="toc18"></A>
|
||
<H3>An abstract syntax example</H3>
|
||
<P>
|
||
To express the abstract syntax of <CODE>food.cf</CODE> in
|
||
a file <CODE>Food.gf</CODE>, we write two kinds of judgements:
|
||
</P>
|
||
<UL>
|
||
<LI>Each category is introduced by a <CODE>cat</CODE> judgement.
|
||
<LI>Each rule label is introduced by a <CODE>fun</CODE> judgement,
|
||
with the type formed from the nonterminals of the rule.
|
||
</UL>
|
||
|
||
<PRE>
|
||
abstract Food = {
|
||
|
||
cat
|
||
S ; Item ; Kind ; Quality ;
|
||
|
||
fun
|
||
Is : Item -> Quality -> S ;
|
||
This, That : Kind -> Item ;
|
||
QKind : Quality -> Kind -> Kind ;
|
||
Wine, Cheese, Fish : Kind ;
|
||
Very : Quality -> Quality ;
|
||
Fresh, Warm, Italian, Expensive, Delicious, Boring : Quality ;
|
||
}
|
||
</PRE>
|
||
<P>
|
||
Notice the use of shorthands permitting the sharing of
|
||
the keyword in subsequent judgements, and of the type
|
||
in subsequent <CODE>fun</CODE> judgements.
|
||
</P>
|
||
<A NAME="toc19"></A>
|
||
<H3>A concrete syntax example</H3>
|
||
<P>
|
||
Each category introduced in <CODE>Food.gf</CODE> is
|
||
given a <CODE>lincat</CODE> rule, and each
|
||
function is given a <CODE>lin</CODE> rule. Similar shorthands
|
||
apply as in <CODE>abstract</CODE> modules.
|
||
</P>
|
||
<PRE>
|
||
concrete FoodEng of Food = {
|
||
|
||
lincat
|
||
S, Item, Kind, Quality = {s : Str} ;
|
||
|
||
lin
|
||
Is item quality = {s = item.s ++ "is" ++ quality.s} ;
|
||
This kind = {s = "this" ++ kind.s} ;
|
||
That kind = {s = "that" ++ kind.s} ;
|
||
QKind quality kind = {s = quality.s ++ kind.s} ;
|
||
Wine = {s = "wine"} ;
|
||
Cheese = {s = "cheese"} ;
|
||
Fish = {s = "fish"} ;
|
||
Very quality = {s = "very" ++ quality.s} ;
|
||
Fresh = {s = "fresh"} ;
|
||
Warm = {s = "warm"} ;
|
||
Italian = {s = "Italian"} ;
|
||
Expensive = {s = "expensive"} ;
|
||
Delicious = {s = "delicious"} ;
|
||
Boring = {s = "boring"} ;
|
||
}
|
||
</PRE>
|
||
<P></P>
|
||
<A NAME="toc20"></A>
|
||
<H3>Modules and files</H3>
|
||
<P>
|
||
Module name + <CODE>.gf</CODE> = file name
|
||
</P>
|
||
<P>
|
||
Each module is compiled into a <CODE>.gfc</CODE> file.
|
||
</P>
|
||
<P>
|
||
Import <CODE>FoodEng.gf</CODE> and see what happens
|
||
</P>
|
||
<PRE>
|
||
> i FoodEng.gf
|
||
</PRE>
|
||
<P>
|
||
The GF program does not only read the file
|
||
<CODE>FoodEng.gf</CODE>, but also all other files that it
|
||
depends on - in this case, <CODE>Food.gf</CODE>.
|
||
</P>
|
||
<P>
|
||
For each file that is compiled, a <CODE>.gfc</CODE> file
|
||
is generated. The GFC format (="GF Canonical") is the
|
||
"machine code" of GF, which is faster to process than
|
||
GF source files. When reading a module, GF decides whether
|
||
to use an existing <CODE>.gfc</CODE> file or to generate
|
||
a new one, by looking at modification times.
|
||
</P>
|
||
<A NAME="toc21"></A>
|
||
<H2>Multilingual grammars and translation</H2>
|
||
<P>
|
||
The main advantage of separating abstract from concrete syntax is that
|
||
one abstract syntax can be equipped with many concrete syntaxes.
|
||
A system with this property is called a <B>multilingual grammar</B>.
|
||
</P>
|
||
<P>
|
||
Multilingual grammars can be used for applications such as
|
||
translation. Let us buid an Italian concrete syntax for
|
||
<CODE>Food</CODE> and then test the resulting
|
||
multilingual grammar.
|
||
</P>
|
||
<A NAME="toc22"></A>
|
||
<H3>An Italian concrete syntax</H3>
|
||
<PRE>
|
||
concrete FoodIta of Food = {
|
||
|
||
lincat
|
||
S, Item, Kind, Quality = {s : Str} ;
|
||
|
||
lin
|
||
Is item quality = {s = item.s ++ "<22>" ++ quality.s} ;
|
||
This kind = {s = "questo" ++ kind.s} ;
|
||
That kind = {s = "quello" ++ kind.s} ;
|
||
QKind quality kind = {s = kind.s ++ quality.s} ;
|
||
Wine = {s = "vino"} ;
|
||
Cheese = {s = "formaggio"} ;
|
||
Fish = {s = "pesce"} ;
|
||
Very quality = {s = "molto" ++ quality.s} ;
|
||
Fresh = {s = "fresco"} ;
|
||
Warm = {s = "caldo"} ;
|
||
Italian = {s = "italiano"} ;
|
||
Expensive = {s = "caro"} ;
|
||
Delicious = {s = "delizioso"} ;
|
||
Boring = {s = "noioso"} ;
|
||
|
||
}
|
||
|
||
</PRE>
|
||
<P></P>
|
||
<A NAME="toc23"></A>
|
||
<H3>Using a multilingual grammar</H3>
|
||
<P>
|
||
Import the two grammars in the same GF session.
|
||
</P>
|
||
<PRE>
|
||
> i FoodEng.gf
|
||
> i FoodIta.gf
|
||
</PRE>
|
||
<P>
|
||
Try generation now:
|
||
</P>
|
||
<PRE>
|
||
> gr | l
|
||
quello formaggio molto noioso <20> italiano
|
||
|
||
> gr | l -lang=FoodEng
|
||
this fish is warm
|
||
</PRE>
|
||
<P>
|
||
Translate by using a pipe:
|
||
</P>
|
||
<PRE>
|
||
> p -lang=FoodEng "this cheese is very delicious" | l -lang=FoodIta
|
||
questo formaggio <20> molto delizioso
|
||
</PRE>
|
||
<P>
|
||
The <CODE>lang</CODE> flag tells GF which concrete syntax to use in parsing and
|
||
linearization. By default, the flag is set to the last-imported grammar.
|
||
To see what grammars are in scope and which is the main one, use the command
|
||
<CODE>print_options = po</CODE>:
|
||
</P>
|
||
<PRE>
|
||
> print_options
|
||
main abstract : Food
|
||
main concrete : FoodIta
|
||
actual concretes : FoodIta FoodEng
|
||
</PRE>
|
||
<P></P>
|
||
<A NAME="toc24"></A>
|
||
<H3>Translation session</H3>
|
||
<P>
|
||
If translation is what you want to do with a set of grammars, a convenient
|
||
way to do it is to open a <CODE>translation_session = ts</CODE>. In this session,
|
||
you can translate between all the languages that are in scope.
|
||
A dot <CODE>.</CODE> terminates the translation session.
|
||
</P>
|
||
<PRE>
|
||
> ts
|
||
|
||
trans> that very warm cheese is boring
|
||
quello formaggio molto caldo <20> noioso
|
||
that very warm cheese is boring
|
||
|
||
trans> questo vino molto italiano <20> molto delizioso
|
||
questo vino molto italiano <20> molto delizioso
|
||
this very Italian wine is very delicious
|
||
|
||
trans> .
|
||
>
|
||
</PRE>
|
||
<P></P>
|
||
<A NAME="toc25"></A>
|
||
<H3>Translation quiz</H3>
|
||
<P>
|
||
This is a simple language exercise that can be automatically
|
||
generated from a multilingual grammar. The system generates a set of
|
||
random sentences, displays them in one language, and checks the user's
|
||
answer given in another language. The command <CODE>translation_quiz = tq</CODE>
|
||
makes this in a subshell of GF.
|
||
</P>
|
||
<PRE>
|
||
> translation_quiz FoodEng FoodIta
|
||
|
||
Welcome to GF Translation Quiz.
|
||
The quiz is over when you have done at least 10 examples
|
||
with at least 75 % success.
|
||
You can interrupt the quiz by entering a line consisting of a dot ('.').
|
||
|
||
this fish is warm
|
||
questo pesce <20> caldo
|
||
> Yes.
|
||
Score 1/1
|
||
|
||
this cheese is Italian
|
||
questo formaggio <20> noioso
|
||
> No, not questo formaggio <20> noioso, but
|
||
questo formaggio <20> italiano
|
||
|
||
Score 1/2
|
||
this fish is expensive
|
||
</PRE>
|
||
<P>
|
||
You can also generate a list of translation exercises and save it in a
|
||
file for later use, by the command <CODE>translation_list = tl</CODE>
|
||
</P>
|
||
<PRE>
|
||
> translation_list -number=25 FoodEng FoodIta
|
||
</PRE>
|
||
<P>
|
||
The <CODE>number</CODE> flag gives the number of sentences generated.
|
||
</P>
|
||
<A NAME="toc26"></A>
|
||
<H2>Grammar architecture</H2>
|
||
<A NAME="toc27"></A>
|
||
<H3>Extending a grammar</H3>
|
||
<P>
|
||
The module system of GF makes it possible to <B>extend</B> a
|
||
grammar in different ways. The syntax of extension is
|
||
shown by the following example. We extend <CODE>Food</CODE> by
|
||
adding a category of questions and two new functions.
|
||
</P>
|
||
<PRE>
|
||
abstract Morefood = Food ** {
|
||
cat
|
||
Question ;
|
||
fun
|
||
QIs : Item -> Quality -> Question ;
|
||
Pizza : Kind ;
|
||
|
||
}
|
||
</PRE>
|
||
<P>
|
||
Parallel to the abstract syntax, extensions can
|
||
be built for concrete syntaxes:
|
||
</P>
|
||
<PRE>
|
||
concrete MorefoodEng of Morefood = FoodEng ** {
|
||
lincat
|
||
Question = {s : Str} ;
|
||
lin
|
||
QIs item quality = {s = "is" ++ item.s ++ quality.s} ;
|
||
Pizza = {s = "pizza"} ;
|
||
}
|
||
</PRE>
|
||
<P>
|
||
The effect of extension is that all of the contents of the extended
|
||
and extending module are put together.
|
||
</P>
|
||
<A NAME="toc28"></A>
|
||
<H3>Multiple inheritance</H3>
|
||
<P>
|
||
Specialized vocabularies can be represented as small grammars that
|
||
only do "one thing" each. For instance, the following are grammars
|
||
for fruit and mushrooms
|
||
</P>
|
||
<PRE>
|
||
abstract Fruit = {
|
||
cat Fruit ;
|
||
fun Apple, Peach : Fruit ;
|
||
}
|
||
|
||
abstract Mushroom = {
|
||
cat Mushroom ;
|
||
fun Cep, Agaric : Mushroom ;
|
||
}
|
||
</PRE>
|
||
<P>
|
||
They can afterwards be combined into bigger grammars by using
|
||
<B>multiple inheritance</B>, i.e. extension of several grammars at the
|
||
same time:
|
||
</P>
|
||
<PRE>
|
||
abstract Foodmarket = Food, Fruit, Mushroom ** {
|
||
fun
|
||
FruitKind : Fruit -> Kind ;
|
||
MushroomKind : Mushroom -> Kind ;
|
||
}
|
||
</PRE>
|
||
<P>
|
||
At this point, you would perhaps like to go back to
|
||
<CODE>Food</CODE> and take apart <CODE>Wine</CODE> to build a special
|
||
<CODE>Drink</CODE> module.
|
||
</P>
|
||
<A NAME="toc29"></A>
|
||
<H3>Visualizing module structure</H3>
|
||
<P>
|
||
When you have created all the abstract syntaxes and
|
||
one set of concrete syntaxes needed for <CODE>Foodmarket</CODE>,
|
||
your grammar consists of eight GF modules. To see how their
|
||
dependences look like, you can use the command
|
||
<CODE>visualize_graph = vg</CODE>,
|
||
</P>
|
||
<PRE>
|
||
> visualize_graph
|
||
</PRE>
|
||
<P>
|
||
and the graph will pop up in a separate window.
|
||
</P>
|
||
<P>
|
||
The graph uses
|
||
</P>
|
||
<UL>
|
||
<LI>oval boxes for abstract modules
|
||
<LI>square boxes for concrete modules
|
||
<LI>black-headed arrows for inheritance
|
||
<LI>white-headed arrows for the concrete-of-abstract relation
|
||
<P></P>
|
||
<IMG ALIGN="middle" SRC="Foodmarket.png" BORDER="0" ALT="">
|
||
</UL>
|
||
|
||
<A NAME="toc30"></A>
|
||
<H2>System commands</H2>
|
||
<P>
|
||
To document your grammar, you may want to print the
|
||
graph into a file, e.g. a <CODE>.png</CODE> file that
|
||
can be included in an HTML document. You can do this
|
||
by first printing the graph into a file <CODE>.dot</CODE> and then
|
||
processing this file with the <CODE>dot</CODE> program.
|
||
</P>
|
||
<PRE>
|
||
> pm -printer=graph | wf Foodmarket.dot
|
||
> ! dot -Tpng Foodmarket.dot > Foodmarket.png
|
||
</PRE>
|
||
<P>
|
||
The latter command is a Unix command, issued from GF by using the
|
||
shell escape symbol <CODE>!</CODE>. The resulting graph was shown in the previous section.
|
||
</P>
|
||
<P>
|
||
The command <CODE>print_multi = pm</CODE> is used for printing the current multilingual
|
||
grammar in various formats, of which the format <CODE>-printer=graph</CODE> just
|
||
shows the module dependencies. Use the <CODE>help</CODE> to see what other formats
|
||
are available:
|
||
</P>
|
||
<PRE>
|
||
> help pm
|
||
> help -printer
|
||
</PRE>
|
||
<P></P>
|
||
<A NAME="toc31"></A>
|
||
<H2>Resource modules</H2>
|
||
<A NAME="toc32"></A>
|
||
<H3>The golden rule of functional programming</H3>
|
||
<P>
|
||
In comparison to the <CODE>.cf</CODE> format, the <CODE>.gf</CODE> format still looks rather
|
||
verbose, and demands lots more characters to be written. You have probably
|
||
done this by the copy-paste-modify method, which is a standard way to
|
||
avoid repeating work.
|
||
</P>
|
||
<P>
|
||
However, there is a more elegant way to avoid repeating work than the copy-and-paste
|
||
method. The <B>golden rule of functional programming</B> says that
|
||
</P>
|
||
<UL>
|
||
<LI>whenever you find yourself programming by copy-and-paste, write a function instead.
|
||
</UL>
|
||
|
||
<P>
|
||
A function separates the shared parts of different computations from the
|
||
changing parts, parameters. In functional programming languages, such as
|
||
<A HREF="http://www.haskell.org">Haskell</A>, it is possible to share muc more than in
|
||
the languages such as C and Java.
|
||
</P>
|
||
<A NAME="toc33"></A>
|
||
<H3>Operation definitions</H3>
|
||
<P>
|
||
GF is a functional programming language, not only in the sense that
|
||
the abstract syntax is a system of functions (<CODE>fun</CODE>), but also because
|
||
functional programming can be used to define concrete syntax. This is
|
||
done by using a new form of judgement, with the keyword <CODE>oper</CODE> (for
|
||
<B>operation</B>), distinct from <CODE>fun</CODE> for the sake of clarity.
|
||
Here is a simple example of an operation:
|
||
</P>
|
||
<PRE>
|
||
oper ss : Str -> {s : Str} = \x -> {s = x} ;
|
||
</PRE>
|
||
<P>
|
||
The operation can be <B>applied</B> to an argument, and GF will
|
||
<B>compute</B> the application into a value. For instance,
|
||
</P>
|
||
<PRE>
|
||
ss "boy" ---> {s = "boy"}
|
||
</PRE>
|
||
<P>
|
||
(We use the symbol <CODE>---></CODE> to indicate how an expression is
|
||
computed into a value; this symbol is not a part of GF)
|
||
</P>
|
||
<P>
|
||
Thus an <CODE>oper</CODE> judgement includes the name of the defined operation,
|
||
its type, and an expression defining it. As for the syntax of the defining
|
||
expression, notice the <B>lambda abstraction</B> form <CODE>\x -> t</CODE> of
|
||
the function.
|
||
</P>
|
||
<A NAME="toc34"></A>
|
||
<H3>The ``resource`` module type</H3>
|
||
<P>
|
||
Operator definitions can be included in a concrete syntax.
|
||
But they are not really tied to a particular set of linearization rules.
|
||
They should rather be seen as <B>resources</B>
|
||
usable in many concrete syntaxes.
|
||
</P>
|
||
<P>
|
||
The <CODE>resource</CODE> module type can be used to package
|
||
<CODE>oper</CODE> definitions into reusable resources. Here is
|
||
an example, with a handful of operations to manipulate
|
||
strings and records.
|
||
</P>
|
||
<PRE>
|
||
resource StringOper = {
|
||
oper
|
||
SS : Type = {s : Str} ;
|
||
|
||
ss : Str -> SS = \x -> {s = x} ;
|
||
|
||
cc : SS -> SS -> SS = \x,y -> ss (x.s ++ y.s) ;
|
||
|
||
prefix : Str -> SS -> SS = \p,x -> ss (p ++ x.s) ;
|
||
}
|
||
</PRE>
|
||
<P>
|
||
Resource modules can extend other resource modules, in the
|
||
same way as modules of other types can extend modules of the
|
||
same type. Thus it is possible to build resource hierarchies.
|
||
</P>
|
||
<A NAME="toc35"></A>
|
||
<H3>Opening a ``resource``</H3>
|
||
<P>
|
||
Any number of <CODE>resource</CODE> modules can be
|
||
<B>opened</B> in a <CODE>concrete</CODE> syntax, which
|
||
makes definitions contained
|
||
in the resource usable in the concrete syntax. Here is
|
||
an example, where the resource <CODE>StringOper</CODE> is
|
||
opened in a new version of <CODE>FoodEng</CODE>.
|
||
</P>
|
||
<PRE>
|
||
concrete Food2Eng of Food = open StringOper in {
|
||
|
||
lincat
|
||
S, Item, Kind, Quality = SS ;
|
||
|
||
lin
|
||
Is item quality = cc item (prefix "is" quality) ;
|
||
This = prefix "this" ;
|
||
That = prefix "that" ;
|
||
QKind = cc ;
|
||
Wine = ss "wine" ;
|
||
Cheese = ss "cheese" ;
|
||
Fish = ss "fish" ;
|
||
Very = prefix "very" ;
|
||
Fresh = ss "fresh" ;
|
||
Warm = ss "warm" ;
|
||
Italian = ss "Italian" ;
|
||
Expensive = ss "expensive" ;
|
||
Delicious = ss "delicious" ;
|
||
Boring = ss "boring" ;
|
||
|
||
}
|
||
</PRE>
|
||
<P>
|
||
The same string operations could be use to write <CODE>FoodIta</CODE>
|
||
more concisely.
|
||
</P>
|
||
<A NAME="toc36"></A>
|
||
<H3>Division of labour</H3>
|
||
<P>
|
||
Using operations defined in resource modules is a
|
||
way to avoid repetitive code.
|
||
In addition, it enables a new kind of modularity
|
||
and division of labour in grammar writing: grammarians familiar with
|
||
the linguistic details of a language can put this knowledge
|
||
available through resource grammar modules, whose users only need
|
||
to pick the right operations and not to know their implementation
|
||
details.
|
||
</P>
|
||
<A NAME="toc37"></A>
|
||
<H2>Morphology</H2>
|
||
<P>
|
||
Suppose we want to say, with the vocabulary included in
|
||
<CODE>Food.gf</CODE>, things like
|
||
</P>
|
||
<PRE>
|
||
all Italian wines are delicious
|
||
</PRE>
|
||
<P>
|
||
The new grammatical facility we need are the plural forms
|
||
of nouns and verbs (<I>wines, are</I>), as opposed to their
|
||
singular forms.
|
||
</P>
|
||
<P>
|
||
The introduction of plural forms requires two things:
|
||
</P>
|
||
<UL>
|
||
<LI>to <B>inflect</B> nouns and verbs in singular and plural number
|
||
<LI>to describe the <B>agreement</B> of the verb to subject: the
|
||
rule that the verb must have the same number as the subject
|
||
</UL>
|
||
|
||
<P>
|
||
Different languages have different rules of inflection and agreement.
|
||
For instance, Italian has also agreement in gender (masculine vs. feminine).
|
||
We want to express such special features of languages in the
|
||
concrete syntax while ignoring them in the abstract syntax.
|
||
</P>
|
||
<P>
|
||
To be able to do all this, we need one new judgement form
|
||
and many new expression forms.
|
||
We also need to generalize linearization types
|
||
from strings to more complex types.
|
||
</P>
|
||
<A NAME="toc38"></A>
|
||
<H3>Parameters and tables</H3>
|
||
<P>
|
||
We define the <B>parameter type</B> of number in Englisn by
|
||
using a new form of judgement:
|
||
</P>
|
||
<PRE>
|
||
param Number = Sg | Pl ;
|
||
</PRE>
|
||
<P>
|
||
To express that <CODE>Kind</CODE> expressions in English have a linearization
|
||
depending on number, we replace the linearization type <CODE>{s : Str}</CODE>
|
||
with a type where the <CODE>s</CODE> field is a <B>table</B> depending on number:
|
||
</P>
|
||
<PRE>
|
||
lincat Kind = {s : Number => Str} ;
|
||
</PRE>
|
||
<P>
|
||
The <B>table type</B> <CODE>Number => Str</CODE> is in many respects similar to
|
||
a function type (<CODE>Number -> Str</CODE>). The main difference is that the
|
||
argument type of a table type must always be a parameter type. This means
|
||
that the argument-value pairs can be listed in a finite table. The following
|
||
example shows such a table:
|
||
</P>
|
||
<PRE>
|
||
lin Cheese = {s = table {
|
||
Sg => "cheese" ;
|
||
Pl => "cheeses"
|
||
}
|
||
} ;
|
||
</PRE>
|
||
<P>
|
||
The application of a table to a parameter is done by the <B>selection</B>
|
||
operator <CODE>!</CODE>. For instance,
|
||
</P>
|
||
<PRE>
|
||
Cheese.s ! Pl
|
||
</PRE>
|
||
<P>
|
||
is a selection, whose value is <CODE>"cheeses"</CODE>.
|
||
</P>
|
||
<A NAME="toc39"></A>
|
||
<H3>Inflection tables, paradigms, and ``oper`` definitions</H3>
|
||
<P>
|
||
All English common nouns are inflected in number, most of them in the
|
||
same way: the plural form is formed from the singular form by adding the
|
||
ending <I>s</I>. This rule is an example of
|
||
a <B>paradigm</B> - a formula telling how the inflection
|
||
forms of a word are formed.
|
||
</P>
|
||
<P>
|
||
From GF point of view, a paradigm is a function that takes a <B>lemma</B> -
|
||
a string also known as a <B>dictionary form</B> - and returns an inflection
|
||
table of desired type. Paradigms are not functions in the sense of the
|
||
<CODE>fun</CODE> judgements of abstract syntax (which operate on trees and not
|
||
on strings), but operations defined in <CODE>oper</CODE> judgements.
|
||
The following operation defines the regular noun paradigm of English:
|
||
</P>
|
||
<PRE>
|
||
oper regNoun : Str -> {s : Number => Str} = \x -> {
|
||
s = table {
|
||
Sg => x ;
|
||
Pl => x + "s"
|
||
}
|
||
} ;
|
||
</PRE>
|
||
<P>
|
||
The <B>gluing</B> operator <CODE>+</CODE> tells that
|
||
the string held in the variable <CODE>x</CODE> and the ending <CODE>"s"</CODE>
|
||
are written together to form one <B>token</B>. Thus, for instance,
|
||
</P>
|
||
<PRE>
|
||
(regNoun "cheese").s ! Pl ---> "cheese" + "s" ---> "cheeses"
|
||
</PRE>
|
||
<P></P>
|
||
<A NAME="toc40"></A>
|
||
<H3>Worst-case macros and data abstraction</H3>
|
||
<P>
|
||
Some English nouns, such as <CODE>mouse</CODE>, are so irregular that
|
||
it makes no sense to see them as instances of a paradigm. Even
|
||
then, it is useful to perform <B>data abstraction</B> from the
|
||
definition of the type <CODE>Noun</CODE>, and introduce a constructor
|
||
operation, a <B>worst-case macro</B> for nouns:
|
||
</P>
|
||
<PRE>
|
||
oper mkNoun : Str -> Str -> Noun = \x,y -> {
|
||
s = table {
|
||
Sg => x ;
|
||
Pl => y
|
||
}
|
||
} ;
|
||
</PRE>
|
||
<P>
|
||
Thus we could define
|
||
</P>
|
||
<PRE>
|
||
lin Mouse = mkNoun "mouse" "mice" ;
|
||
</PRE>
|
||
<P>
|
||
and
|
||
</P>
|
||
<PRE>
|
||
oper regNoun : Str -> Noun = \x ->
|
||
mkNoun x (x + "s") ;
|
||
</PRE>
|
||
<P>
|
||
instead of writing the inflection table explicitly.
|
||
</P>
|
||
<P>
|
||
The grammar engineering advantage of worst-case macros is that
|
||
the author of the resource module may change the definitions of
|
||
<CODE>Noun</CODE> and <CODE>mkNoun</CODE>, and still retain the
|
||
interface (i.e. the system of type signatures) that makes it
|
||
correct to use these functions in concrete modules. In programming
|
||
terms, <CODE>Noun</CODE> is then treated as an <B>abstract datatype</B>.
|
||
</P>
|
||
<A NAME="toc41"></A>
|
||
<H3>A system of paradigms using ``Prelude`` operations</H3>
|
||
<P>
|
||
In addition to the completely regular noun paradigm <CODE>regNoun</CODE>,
|
||
some other frequent noun paradigms deserve to be
|
||
defined, for instance,
|
||
</P>
|
||
<PRE>
|
||
sNoun : Str -> Noun = \kiss -> mkNoun kiss (kiss + "es") ;
|
||
</PRE>
|
||
<P>
|
||
What about nouns like <I>fly</I>, with the plural <I>flies</I>? The already
|
||
available solution is to use the longest common prefix
|
||
<I>fl</I> (also known as the <B>technical stem</B>) as argument, and define
|
||
</P>
|
||
<PRE>
|
||
yNoun : Str -> Noun = \fl -> mkNoun (fl + "y") (fl + "ies") ;
|
||
</PRE>
|
||
<P>
|
||
But this paradigm would be very unintuitive to use, because the technical stem
|
||
is not an existing form of the word. A better solution is to use
|
||
the lemma and a string operator <CODE>init</CODE>, which returns the initial segment (i.e.
|
||
all characters but the last) of a string:
|
||
</P>
|
||
<PRE>
|
||
yNoun : Str -> Noun = \fly -> mkNoun fly (init fly + "ies") ;
|
||
</PRE>
|
||
<P>
|
||
The operator <CODE>init</CODE> belongs to a set of operations in the
|
||
resource module <CODE>Prelude</CODE>, which therefore has to be
|
||
<CODE>open</CODE>ed so that <CODE>init</CODE> can be used.
|
||
</P>
|
||
<A NAME="toc42"></A>
|
||
<H3>An intelligent noun paradigm using ``case`` expressions</H3>
|
||
<P>
|
||
It may be hard for the user of a resource morphology to pick the right
|
||
inflection paradigm. A way to help this is to define a more intelligent
|
||
paradigm, which chooses the ending by first analysing the lemma.
|
||
The following variant for English regular nouns puts together all the
|
||
previously shown paradigms, and chooses one of them on the basis of
|
||
the final letter of the lemma (found by the prelude operator <CODE>last</CODE>).
|
||
</P>
|
||
<PRE>
|
||
regNoun : Str -> Noun = \s -> case last s of {
|
||
"s" | "z" => mkNoun s (s + "es") ;
|
||
"y" => mkNoun s (init s + "ies") ;
|
||
_ => mkNoun s (s + "s")
|
||
} ;
|
||
</PRE>
|
||
<P>
|
||
This definition displays many GF expression forms not shown befores;
|
||
these forms are explained in the next section.
|
||
</P>
|
||
<P>
|
||
The paradigms <CODE>regNoun</CODE> does not give the correct forms for
|
||
all nouns. For instance, <I>mouse - mice</I> and
|
||
<I>fish - fish</I> must be given by using <CODE>mkNoun</CODE>.
|
||
Also the word <I>boy</I> would be inflected incorrectly; to prevent
|
||
this, either use <CODE>mkNoun</CODE> or modify
|
||
<CODE>regNoun</CODE> so that the <CODE>"y"</CODE> case does not
|
||
apply if the second-last character is a vowel.
|
||
</P>
|
||
<A NAME="toc43"></A>
|
||
<H3>Pattern matching</H3>
|
||
<P>
|
||
Expressions of the <CODE>table</CODE> form are built from lists of
|
||
argument-value pairs. These pairs are called the <B>branches</B>
|
||
of the table. In addition to constants introduced in
|
||
<CODE>param</CODE> definitions, the left-hand side of a branch can more
|
||
generally be a <B>pattern</B>, and the computation of selection is
|
||
then performed by <B>pattern matching</B>:
|
||
</P>
|
||
<UL>
|
||
<LI>a variable pattern (identifier other than constant parameter) matches anything
|
||
<LI>the wild card <CODE>_</CODE> matches anything
|
||
<LI>a string literal pattern, e.g. <CODE>"s"</CODE>, matches the same string
|
||
<LI>a disjunctive pattern <CODE>P | ... | Q</CODE> matches anything that
|
||
one of the disjuncts matches
|
||
</UL>
|
||
|
||
<P>
|
||
Pattern matching is performed in the order in which the branches
|
||
appear in the table: the branch of the first matching pattern is followed.
|
||
</P>
|
||
<P>
|
||
As syntactic sugar, one-branch tables can be written concisely,
|
||
</P>
|
||
<PRE>
|
||
\\P,...,Q => t === table {P => ... table {Q => t} ...}
|
||
</PRE>
|
||
<P>
|
||
Finally, the <CODE>case</CODE> expressions common in functional
|
||
programming languages are syntactic sugar for table selections:
|
||
</P>
|
||
<PRE>
|
||
case e of {...} === table {...} ! e
|
||
</PRE>
|
||
<P></P>
|
||
<A NAME="toc44"></A>
|
||
<H3>Morphological ``resource`` modules</H3>
|
||
<P>
|
||
A common idiom is to
|
||
gather the <CODE>oper</CODE> and <CODE>param</CODE> definitions
|
||
needed for inflecting words in
|
||
a language into a morphology module. Here is a simple
|
||
example, <A HREF="MorphoEng.gf"><CODE>MorphoEng</CODE></A>.
|
||
</P>
|
||
<PRE>
|
||
--# -path=.:prelude
|
||
|
||
resource MorphoEng = open Prelude in {
|
||
|
||
param
|
||
Number = Sg | Pl ;
|
||
|
||
oper
|
||
Noun, Verb : Type = {s : Number => Str} ;
|
||
|
||
mkNoun : Str -> Str -> Noun = \x,y -> {
|
||
s = table {
|
||
Sg => x ;
|
||
Pl => y
|
||
}
|
||
} ;
|
||
|
||
regNoun : Str -> Noun = \s -> case last s of {
|
||
"s" | "z" => mkNoun s (s + "es") ;
|
||
"y" => mkNoun s (init s + "ies") ;
|
||
_ => mkNoun s (s + "s")
|
||
} ;
|
||
|
||
mkVerb : Str -> Str -> Verb = \x,y -> mkNoun y x ;
|
||
|
||
regVerb : Str -> Verb = \s -> case last s of {
|
||
"s" | "z" => mkVerb s (s + "es") ;
|
||
"y" => mkVerb s (init s + "ies") ;
|
||
"o" => mkVerb s (s + "es") ;
|
||
_ => mkVerb s (s + "s")
|
||
} ;
|
||
}
|
||
</PRE>
|
||
<P>
|
||
The first line gives as a hint to the compiler the
|
||
<B>search path</B> needed to find all the other modules that the
|
||
module depends on. The directory <CODE>prelude</CODE> is a subdirectory of
|
||
<CODE>GF/lib</CODE>; to be able to refer to it in this simple way, you can
|
||
set the environment variable <CODE>GF_LIB_PATH</CODE> to point to this
|
||
directory.
|
||
</P>
|
||
<A NAME="toc45"></A>
|
||
<H3>Testing ``resource`` modules</H3>
|
||
<P>
|
||
To test a <CODE>resource</CODE> module independently, you can import it
|
||
with a flag that tells GF to retain the <CODE>oper</CODE> definitions
|
||
in the memory; the usual behaviour is that <CODE>oper</CODE> definitions
|
||
are just applied to compile linearization rules
|
||
(this is called <B>inlining</B>) and then thrown away.
|
||
</P>
|
||
<PRE>
|
||
> i -retain MorphoEng.gf
|
||
</PRE>
|
||
<P></P>
|
||
<P>
|
||
The command <CODE>compute_concrete = cc</CODE> computes any expression
|
||
formed by operations and other GF constructs. For example,
|
||
</P>
|
||
<PRE>
|
||
> cc regVerb "echo"
|
||
{s : Number => Str = table Number {
|
||
Sg => "echoes" ;
|
||
Pl => "echo"
|
||
}
|
||
}
|
||
</PRE>
|
||
<P></P>
|
||
<P>
|
||
The command <CODE>show_operations = so`</CODE> shows the type signatures
|
||
of all operations returning a given value type:
|
||
</P>
|
||
<PRE>
|
||
> so Verb
|
||
MorphoEng.mkNoun : Str -> Str -> {s : {MorphoEng.Number} => Str}
|
||
MorphoEng.mkVerb : Str -> Str -> {s : {MorphoEng.Number} => Str}
|
||
MorphoEng.regNoun : Str -> {s : {MorphoEng.Number} => Str}
|
||
MorphoEng.regVerb : Str -> { s : {MorphoEng.Number} => Str}
|
||
</PRE>
|
||
<P>
|
||
Why does the command also show the operations that form
|
||
<CODE>Noun</CODE>s? The reason is that the type expression
|
||
<CODE>Verb</CODE> is first computed, and its value happens to be
|
||
the same as the value of <CODE>Noun</CODE>.
|
||
</P>
|
||
<A NAME="toc46"></A>
|
||
<H2>Using morphology in concrete syntax</H2>
|
||
<P>
|
||
We can now enrich the concrete syntax definitions to
|
||
comprise morphology. This will involve a more radical
|
||
variation between languages (e.g. English and Italian)
|
||
then just the use of different words. In general,
|
||
parameters and linearization types are different in
|
||
different languages - but this does not prevent the
|
||
use of a common abstract syntax.
|
||
</P>
|
||
<A NAME="toc47"></A>
|
||
<H3>Parametric vs. inherent features, agreement</H3>
|
||
<P>
|
||
The rule of subject-verb agreement in English says that the verb
|
||
phrase must be inflected in the number of the subject. This
|
||
means that a noun phrase (functioning as a subject), inherently
|
||
<I>has</I> a number, which it passes to the verb. The verb does not
|
||
<I>have</I> a number, but must be able to receive whatever number the
|
||
subject has. This distinction is nicely represented by the
|
||
different linearization types of <B>noun phrases</B> and <B>verb phrases</B>:
|
||
</P>
|
||
<PRE>
|
||
lincat NP = {s : Str ; n : Number} ;
|
||
lincat VP = {s : Number => Str} ;
|
||
</PRE>
|
||
<P>
|
||
We say that the number of <CODE>NP</CODE> is an <B>inherent feature</B>,
|
||
whereas the number of <CODE>NP</CODE> is <B>parametric</B>.
|
||
</P>
|
||
<P>
|
||
The agreement rule itself is expressed in the linearization rule of
|
||
the predication structure:
|
||
</P>
|
||
<PRE>
|
||
lin PredVP np vp = {s = np.s ++ vp.s ! np.n} ;
|
||
</PRE>
|
||
<P>
|
||
The following section will present
|
||
<CODE>FoodsEng</CODE>, assuming the abstract syntax <CODE>Foods</CODE>
|
||
that is similar to <CODE>Food</CODE> but also has the
|
||
plural determiners <CODE>All</CODE> and <CODE>Most</CODE>.
|
||
The reader is invited to inspect the way in which agreement works in
|
||
the formation of sentences.
|
||
</P>
|
||
<A NAME="toc48"></A>
|
||
<H3>English concrete syntax with parameters</H3>
|
||
<PRE>
|
||
--# -path=.:prelude
|
||
|
||
concrete FoodsEng of Foods = open Prelude, MorphoEng in {
|
||
|
||
lincat
|
||
S, Quality = SS ;
|
||
Kind = {s : Number => Str} ;
|
||
Item = {s : Str ; n : Number} ;
|
||
|
||
lin
|
||
Is item quality = ss (item.s ++ (mkVerb "are" "is").s ! item.n ++ quality.s) ;
|
||
This = det Sg "this" ;
|
||
That = det Sg "that" ;
|
||
All = det Pl "all" ;
|
||
Most = det Pl "most" ;
|
||
QKind quality kind = {s = \\n => quality.s ++ kind.s ! n} ;
|
||
Wine = regNoun "wine" ;
|
||
Cheese = regNoun "cheese" ;
|
||
Fish = mkNoun "fish" "fish" ;
|
||
Very = prefixSS "very" ;
|
||
Fresh = ss "fresh" ;
|
||
Warm = ss "warm" ;
|
||
Italian = ss "Italian" ;
|
||
Expensive = ss "expensive" ;
|
||
Delicious = ss "delicious" ;
|
||
Boring = ss "boring" ;
|
||
|
||
oper
|
||
det : Number -> Str -> Noun -> {s : Str ; n : Number} = \n,d,cn -> {
|
||
s = d ++ cn.s ! n ;
|
||
n = n
|
||
} ;
|
||
|
||
}
|
||
</PRE>
|
||
<P></P>
|
||
<A NAME="toc49"></A>
|
||
<H3>Hierarchic parameter types</H3>
|
||
<P>
|
||
The reader familiar with a functional programming language such as
|
||
<A HREF="http://www.haskell.org">Haskell</A> must have noticed the similarity
|
||
between parameter types in GF and <B>algebraic datatypes</B> (<CODE>data</CODE> definitions
|
||
in Haskell). The GF parameter types are actually a special case of algebraic
|
||
datatypes: the main restriction is that in GF, these types must be finite.
|
||
(It is this restriction that makes it possible to invert linearization rules into
|
||
parsing methods.)
|
||
</P>
|
||
<P>
|
||
However, finite is not the same thing as enumerated. Even in GF, parameter
|
||
constructors can take arguments, provided these arguments are from other
|
||
parameter types - only recursion is forbidden. Such parameter types impose a
|
||
hierarchic order among parameters. They are often needed to define
|
||
the linguistically most accurate parameter systems.
|
||
</P>
|
||
<P>
|
||
To give an example, Swedish adjectives
|
||
are inflected in number (singular or plural) and
|
||
gender (uter or neuter). These parameters would suggest 2*2=4 different
|
||
forms. However, the gender distinction is done only in the singular. Therefore,
|
||
it would be inaccurate to define adjective paradigms using the type
|
||
<CODE>Gender => Number => Str</CODE>. The following hierarchic definition
|
||
yields an accurate system of three adjectival forms.
|
||
</P>
|
||
<PRE>
|
||
param AdjForm = ASg Gender | APl ;
|
||
param Gender = Uter | Neuter ;
|
||
</PRE>
|
||
<P>
|
||
In pattern matching, a constructor can have patterns as arguments. For instance,
|
||
the adjectival paradigm in which the two singular forms are the same, can be defined
|
||
</P>
|
||
<PRE>
|
||
oper plattAdj : Str -> AdjForm => Str = \x -> table {
|
||
ASg _ => x ;
|
||
APl => x + "a" ;
|
||
}
|
||
</PRE>
|
||
<P></P>
|
||
<A NAME="toc50"></A>
|
||
<H3>Morphological analysis and morphology quiz</H3>
|
||
<P>
|
||
Even though in GF morphology
|
||
is mostly seen as an auxiliary of syntax, a morphology once defined
|
||
can be used on its own right. The command <CODE>morpho_analyse = ma</CODE>
|
||
can be used to read a text and return for each word the analyses that
|
||
it has in the current concrete syntax.
|
||
</P>
|
||
<PRE>
|
||
> rf bible.txt | morpho_analyse
|
||
</PRE>
|
||
<P>
|
||
In the same way as translation exercises, morphological exercises can
|
||
be generated, by the command <CODE>morpho_quiz = mq</CODE>. Usually,
|
||
the category is set to be something else than <CODE>S</CODE>. For instance,
|
||
</P>
|
||
<PRE>
|
||
> i lib/resource/french/VerbsFre.gf
|
||
> morpho_quiz -cat=V
|
||
|
||
Welcome to GF Morphology Quiz.
|
||
...
|
||
|
||
r<>appara<72>tre : VFin VCondit Pl P2
|
||
r<>apparaitriez
|
||
> No, not r<>apparaitriez, but
|
||
r<>appara<72>triez
|
||
Score 0/1
|
||
</PRE>
|
||
<P>
|
||
Finally, a list of morphological exercises and save it in a
|
||
file for later use, by the command <CODE>morpho_list = ml</CODE>
|
||
</P>
|
||
<PRE>
|
||
> morpho_list -number=25 -cat=V
|
||
</PRE>
|
||
<P>
|
||
The <CODE>number</CODE> flag gives the number of exercises generated.
|
||
</P>
|
||
<A NAME="toc51"></A>
|
||
<H3>Discontinuous constituents</H3>
|
||
<P>
|
||
A linearization type may contain more strings than one.
|
||
An example of where this is useful are English particle
|
||
verbs, such as <I>switch off</I>. The linearization of
|
||
a sentence may place the object between the verb and the particle:
|
||
<I>he switched it off</I>.
|
||
</P>
|
||
<P>
|
||
The first of the following judgements defines transitive verbs as
|
||
<B>discontinuous constituents</B>, i.e. as having a linearization
|
||
type with two strings and not just one. The second judgement
|
||
shows how the constituents are separated by the object in complementization.
|
||
</P>
|
||
<PRE>
|
||
lincat TV = {s : Number => Str ; s2 : Str} ;
|
||
lin ComplTV tv obj = {s = \\n => tv.s ! n ++ obj.s ++ tv.s2} ;
|
||
</PRE>
|
||
<P>
|
||
There is no restriction in the number of discontinuous constituents
|
||
(or other fields) a <CODE>lincat</CODE> may contain. The only condition is that
|
||
the fields must be of finite types, i.e. built from records, tables,
|
||
parameters, and <CODE>Str</CODE>, and not functions. A mathematical result
|
||
about parsing in GF says that the worst-case complexity of parsing
|
||
increases with the number of discontinuous constituents. Moreover,
|
||
the parsing and linearization commands only give reliable results
|
||
for categories whose linearization type has a unique <CODE>Str</CODE> valued
|
||
field labelled <CODE>s</CODE>.
|
||
</P>
|
||
<A NAME="toc52"></A>
|
||
<H2>More constructs for concrete syntax</H2>
|
||
<A NAME="toc53"></A>
|
||
<H3>Free variation</H3>
|
||
<P>
|
||
Sometimes there are many alternative ways to define a concrete syntax.
|
||
For instance, the verb negation in English can be expressed both by
|
||
<I>does not</I> and <I>doesn't</I>. In linguistic terms, these expressions
|
||
are in <B>free variation</B>. The <CODE>variants</CODE> construct of GF can
|
||
be used to give a list of strings in free variation. For example,
|
||
</P>
|
||
<PRE>
|
||
NegVerb verb = {s = variants {["does not"] ; "doesn't} ++ verb.s} ;
|
||
</PRE>
|
||
<P>
|
||
An empty variant list
|
||
</P>
|
||
<PRE>
|
||
variants {}
|
||
</PRE>
|
||
<P>
|
||
can be used e.g. if a word lacks a certain form.
|
||
</P>
|
||
<P>
|
||
In general, <CODE>variants</CODE> should be used cautiously. It is not
|
||
recommended for modules aimed to be libraries, because the
|
||
user of the library has no way to choose among the variants.
|
||
Moreover, even though <CODE>variants</CODE> admits lists of any type,
|
||
its semantics for complex types can cause surprises.
|
||
</P>
|
||
<A NAME="toc54"></A>
|
||
<H3>Record extension and subtyping</H3>
|
||
<P>
|
||
Record types and records can be <B>extended</B> with new fields. For instance,
|
||
in German it is natural to see transitive verbs as verbs with a case.
|
||
The symbol <CODE>**</CODE> is used for both constructs.
|
||
</P>
|
||
<PRE>
|
||
lincat TV = Verb ** {c : Case} ;
|
||
|
||
lin Follow = regVerb "folgen" ** {c = Dative} ;
|
||
</PRE>
|
||
<P>
|
||
To extend a record type or a record with a field whose label it
|
||
already has is a type error.
|
||
</P>
|
||
<P>
|
||
A record type <I>T</I> is a <B>subtype</B> of another one <I>R</I>, if <I>T</I> has
|
||
all the fields of <I>R</I> and possibly other fields. For instance,
|
||
an extension of a record type is always a subtype of it.
|
||
</P>
|
||
<P>
|
||
If <I>T</I> is a subtype of <I>R</I>, an object of <I>T</I> can be used whenever
|
||
an object of <I>R</I> is required. For instance, a transitive verb can
|
||
be used whenever a verb is required.
|
||
</P>
|
||
<P>
|
||
<B>Contravariance</B> means that a function taking an <I>R</I> as argument
|
||
can also be applied to any object of a subtype <I>T</I>.
|
||
</P>
|
||
<A NAME="toc55"></A>
|
||
<H3>Tuples and product types</H3>
|
||
<P>
|
||
Product types and tuples are syntactic sugar for record types and records:
|
||
</P>
|
||
<PRE>
|
||
T1 * ... * Tn === {p1 : T1 ; ... ; pn : Tn}
|
||
<t1, ..., tn> === {p1 = T1 ; ... ; pn = Tn}
|
||
</PRE>
|
||
<P>
|
||
Thus the labels <CODE>p1, p2,...`</CODE> are hard-coded.
|
||
</P>
|
||
<A NAME="toc56"></A>
|
||
<H3>Prefix-dependent choices</H3>
|
||
<P>
|
||
The construct exemplified in
|
||
</P>
|
||
<PRE>
|
||
oper artIndef : Str =
|
||
pre {"a" ; "an" / strs {"a" ; "e" ; "i" ; "o"}} ;
|
||
</PRE>
|
||
<P>
|
||
Thus
|
||
</P>
|
||
<PRE>
|
||
artIndef ++ "cheese" ---> "a" ++ "cheese"
|
||
artIndef ++ "apple" ---> "an" ++ "cheese"
|
||
</PRE>
|
||
<P>
|
||
This very example does not work in all situations: the prefix
|
||
<I>u</I> has no general rules, and some problematic words are
|
||
<I>euphemism, one-eyed, n-gram</I>. It is possible to write
|
||
</P>
|
||
<PRE>
|
||
oper artIndef : Str =
|
||
pre {"a" ;
|
||
"a" / strs {"eu" ; "one"} ;
|
||
"an" / strs {"a" ; "e" ; "i" ; "o" ; "n-"}
|
||
} ;
|
||
</PRE>
|
||
<P></P>
|
||
<A NAME="toc57"></A>
|
||
<H3>Predefined types and operations</H3>
|
||
<P>
|
||
GF has the following predefined categories in abstract syntax:
|
||
</P>
|
||
<PRE>
|
||
cat Int ; -- integers, e.g. 0, 5, 743145151019
|
||
cat Float ; -- floats, e.g. 0.0, 3.1415926
|
||
cat String ; -- strings, e.g. "", "foo", "123"
|
||
</PRE>
|
||
<P>
|
||
The objects of each of these categories are <B>literals</B>
|
||
as indicated in the comments above. No <CODE>fun</CODE> definition
|
||
can have a predefined category as its value type, but
|
||
they can be used as arguments. For example:
|
||
</P>
|
||
<PRE>
|
||
fun StreetAddress : Int -> String -> Address ;
|
||
lin StreetAddress number street = {s = number.s ++ street.s} ;
|
||
|
||
-- e.g. (StreetAddress 10 "Downing Street") : Address
|
||
</PRE>
|
||
<P></P>
|
||
<A NAME="toc58"></A>
|
||
<H2>More features of the module system</H2>
|
||
<A NAME="toc59"></A>
|
||
<H3>Resource grammars and their reuse</H3>
|
||
<P>
|
||
See
|
||
<A HREF="../../lib/resource/doc/gf-resource.html">resource library documentation</A>
|
||
</P>
|
||
<A NAME="toc60"></A>
|
||
<H3>Interfaces, instances, and functors</H3>
|
||
<P>
|
||
See an
|
||
<A HREF="../../examples/mp3/mp3-resource.html">example built this way</A>
|
||
</P>
|
||
<A NAME="toc61"></A>
|
||
<H3>Restricted inheritance and qualified opening</H3>
|
||
<A NAME="toc62"></A>
|
||
<H2>More concepts of abstract syntax</H2>
|
||
<A NAME="toc63"></A>
|
||
<H3>Dependent types</H3>
|
||
<A NAME="toc64"></A>
|
||
<H3>Higher-order abstract syntax</H3>
|
||
<A NAME="toc65"></A>
|
||
<H3>Semantic definitions</H3>
|
||
<A NAME="toc66"></A>
|
||
<H2>Transfer modules</H2>
|
||
<P>
|
||
Transfer means noncompositional tree-transforming operations.
|
||
The command <CODE>apply_transfer = at</CODE> is typically used in a pipe:
|
||
</P>
|
||
<PRE>
|
||
> p "John walks and John runs" | apply_transfer aggregate | l
|
||
John walks and runs
|
||
</PRE>
|
||
<P>
|
||
See the
|
||
<A HREF="../../transfer/examples/aggregation">sources</A> of this example.
|
||
</P>
|
||
<P>
|
||
See the
|
||
<A HREF="../transfer.html">transfer language documentation</A>
|
||
for more information.
|
||
</P>
|
||
<A NAME="toc67"></A>
|
||
<H2>Practical issues</H2>
|
||
<A NAME="toc68"></A>
|
||
<H3>Lexers and unlexers</H3>
|
||
<P>
|
||
Lexers and unlexers can be chosen from
|
||
a list of predefined ones, using the flags<CODE>-lexer</CODE> and `` -unlexer`` either
|
||
in the grammar file or on the GF command line.
|
||
</P>
|
||
<P>
|
||
Given by <CODE>help -lexer</CODE>, <CODE>help -unlexer</CODE>:
|
||
</P>
|
||
<PRE>
|
||
The default is words.
|
||
-lexer=words tokens are separated by spaces or newlines
|
||
-lexer=literals like words, but GF integer and string literals recognized
|
||
-lexer=vars like words, but "x","x_...","$...$" as vars, "?..." as meta
|
||
-lexer=chars each character is a token
|
||
-lexer=code use Haskell's lex
|
||
-lexer=codevars like code, but treat unknown words as variables, ?? as meta
|
||
-lexer=text with conventions on punctuation and capital letters
|
||
-lexer=codelit like code, but treat unknown words as string literals
|
||
-lexer=textlit like text, but treat unknown words as string literals
|
||
-lexer=codeC use a C-like lexer
|
||
-lexer=ignore like literals, but ignore unknown words
|
||
-lexer=subseqs like ignore, but then try all subsequences from longest
|
||
|
||
The default is unwords.
|
||
-unlexer=unwords space-separated token list (like unwords)
|
||
-unlexer=text format as text: punctuation, capitals, paragraph <p>
|
||
-unlexer=code format as code (spacing, indentation)
|
||
-unlexer=textlit like text, but remove string literal quotes
|
||
-unlexer=codelit like code, but remove string literal quotes
|
||
-unlexer=concat remove all spaces
|
||
-unlexer=bind like identity, but bind at "&+"
|
||
|
||
</PRE>
|
||
<P></P>
|
||
<A NAME="toc69"></A>
|
||
<H3>Efficiency of grammars</H3>
|
||
<P>
|
||
Issues:
|
||
</P>
|
||
<UL>
|
||
<LI>the choice of datastructures in <CODE>lincat</CODE>s
|
||
<LI>the value of the <CODE>optimize</CODE> flag
|
||
<LI>parsing efficiency: <CODE>-mcfg</CODE> vs. others
|
||
</UL>
|
||
|
||
<A NAME="toc70"></A>
|
||
<H3>Speech input and output</H3>
|
||
<P>
|
||
The<CODE>speak_aloud = sa</CODE> command sends a string to the speech
|
||
synthesizer
|
||
<A HREF="http://www.speech.cs.cmu.edu/flite/doc/">Flite</A>.
|
||
It is typically used via a pipe:
|
||
</P>
|
||
<PRE>
|
||
generate_random | linearize | speak_aloud
|
||
</PRE>
|
||
<P>
|
||
The result is only satisfactory for English.
|
||
</P>
|
||
<P>
|
||
The <CODE>speech_input = si</CODE> command receives a string from a
|
||
speech recognizer that requires the installation of
|
||
<A HREF="http://mi.eng.cam.ac.uk/~sjy/software.htm">ATK</A>.
|
||
It is typically used to pipe input to a parser:
|
||
</P>
|
||
<PRE>
|
||
speech_input -tr | parse
|
||
</PRE>
|
||
<P>
|
||
The method words only for grammars of English.
|
||
</P>
|
||
<P>
|
||
Both Flite and ATK are freely available through the links
|
||
above, but they are not distributed together with GF.
|
||
</P>
|
||
<A NAME="toc71"></A>
|
||
<H3>Multilingual syntax editor</H3>
|
||
<P>
|
||
The
|
||
<A HREF="http://www.cs.chalmers.se/~aarne/GF2.0/doc/javaGUImanual/javaGUImanual.htm">Editor User Manual</A>
|
||
describes the use of the editor, which works for any multilingual GF grammar.
|
||
</P>
|
||
<P>
|
||
Here is a snapshot of the editor:
|
||
</P>
|
||
<P>
|
||
<IMG ALIGN="middle" SRC="../quick-editor.gif" BORDER="0" ALT="">
|
||
</P>
|
||
<P>
|
||
The grammars of the snapshot are from the
|
||
<A HREF="http://www.cs.chalmers.se/~aarne/GF/examples/letter">Letter grammar package</A>.
|
||
</P>
|
||
<A NAME="toc72"></A>
|
||
<H3>Interactive Development Environment (IDE)</H3>
|
||
<P>
|
||
Forthcoming.
|
||
</P>
|
||
<A NAME="toc73"></A>
|
||
<H3>Communicating with GF</H3>
|
||
<P>
|
||
Other processes can communicate with the GF command interpreter,
|
||
and also with the GF syntax editor.
|
||
</P>
|
||
<A NAME="toc74"></A>
|
||
<H3>Embedded grammars in Haskell, Java, and Prolog</H3>
|
||
<P>
|
||
GF grammars can be used as parts of programs written in the
|
||
following languages. The links give more documentation.
|
||
</P>
|
||
<UL>
|
||
<LI><A HREF="http://www.cs.chalmers.se/~bringert/gf/gf-java.html">Java</A>
|
||
<LI><A HREF="http://www.cs.chalmers.se/~aarne/GF/src/GF/Embed/EmbedAPI.hs">Haskell</A>
|
||
<LI><A HREF="http://www.cs.chalmers.se/~peb/software.html">Prolog</A>
|
||
</UL>
|
||
|
||
<A NAME="toc75"></A>
|
||
<H3>Alternative input and output grammar formats</H3>
|
||
<P>
|
||
A summary is given in the following chart of GF grammar compiler phases:
|
||
<IMG ALIGN="middle" SRC="../gf-compiler.png" BORDER="0" ALT="">
|
||
</P>
|
||
<A NAME="toc76"></A>
|
||
<H2>Case studies</H2>
|
||
<A NAME="toc77"></A>
|
||
<H3>Interfacing formal and natural languages</H3>
|
||
<P>
|
||
<A HREF="http://www.cs.chalmers.se/~krijo/thesis/thesisA4.pdf">Formal and Informal Software Specifications</A>,
|
||
PhD Thesis by
|
||
<A HREF="http://www.cs.chalmers.se/~krijo">Kristofer Johannisson</A>, is an extensive example of this.
|
||
The system is based on a multilingual grammar relating the formal language OCL with
|
||
English and German.
|
||
</P>
|
||
<P>
|
||
A simpler example will be explained here.
|
||
</P>
|
||
|
||
<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) -->
|
||
<!-- cmdline: txt2tags -\-toc gf-tutorial2.txt -->
|
||
</BODY></HTML>
|