mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-09 04:59:31 -06:00
4855 lines
150 KiB
HTML
4855 lines
150 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<META NAME="generator" CONTENT="http://txt2tags.sf.net">
|
|
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
|
|
<TITLE>Grammatical Framework Tutorial</TITLE>
|
|
</HEAD><BODY BGCOLOR="white" TEXT="black">
|
|
<P ALIGN="center"><CENTER><H1>Grammatical Framework Tutorial</H1>
|
|
<FONT SIZE="4">
|
|
<I>Author: Aarne Ranta aarne (at) cs.chalmers.se</I><BR>
|
|
Last update: Sun Jul 8 18:36:23 2007
|
|
</FONT></CENTER>
|
|
|
|
<P></P>
|
|
<HR NOSHADE SIZE=1>
|
|
<P></P>
|
|
<UL>
|
|
<LI><A HREF="#toc1">Introduction</A>
|
|
<UL>
|
|
<LI><A HREF="#toc2">GF = Grammatical Framework</A>
|
|
<LI><A HREF="#toc3">What are GF grammars used for</A>
|
|
<LI><A HREF="#toc4">Who is this tutorial for</A>
|
|
<LI><A HREF="#toc5">The coverage of the tutorial</A>
|
|
<LI><A HREF="#toc6">Getting the GF program</A>
|
|
<LI><A HREF="#toc7">Running the GF program</A>
|
|
</UL>
|
|
<LI><A HREF="#toc8">The .cf grammar format</A>
|
|
<UL>
|
|
<LI><A HREF="#toc9">Importing grammars and parsing strings</A>
|
|
<LI><A HREF="#toc10">Generating trees and strings</A>
|
|
<LI><A HREF="#toc11">Visualizing trees</A>
|
|
<LI><A HREF="#toc12">Some random-generated sentences</A>
|
|
<LI><A HREF="#toc13">Systematic generation</A>
|
|
<LI><A HREF="#toc14">More on pipes; tracing</A>
|
|
<LI><A HREF="#toc15">Writing and reading files</A>
|
|
</UL>
|
|
<LI><A HREF="#toc16">The .gf grammar format</A>
|
|
<UL>
|
|
<LI><A HREF="#toc17">Abstract and concrete syntax</A>
|
|
<LI><A HREF="#toc18">Judgement forms</A>
|
|
<LI><A HREF="#toc19">Module types</A>
|
|
<LI><A HREF="#toc20">Basic types and function types</A>
|
|
<LI><A HREF="#toc21">Records and strings</A>
|
|
<LI><A HREF="#toc22">An abstract syntax example</A>
|
|
<LI><A HREF="#toc23">A concrete syntax example</A>
|
|
<LI><A HREF="#toc24">Modules and files</A>
|
|
</UL>
|
|
<LI><A HREF="#toc25">Multilingual grammars and translation</A>
|
|
<UL>
|
|
<LI><A HREF="#toc26">An Italian concrete syntax</A>
|
|
<LI><A HREF="#toc27">Using a multilingual grammar</A>
|
|
<LI><A HREF="#toc28">Translation session</A>
|
|
<LI><A HREF="#toc29">Translation quiz</A>
|
|
</UL>
|
|
<LI><A HREF="#toc30">Grammar architecture</A>
|
|
<UL>
|
|
<LI><A HREF="#toc31">Extending a grammar</A>
|
|
<LI><A HREF="#toc32">Multiple inheritance</A>
|
|
<LI><A HREF="#toc33">Visualizing module structure</A>
|
|
<LI><A HREF="#toc34">System commands</A>
|
|
</UL>
|
|
<LI><A HREF="#toc35">Resource modules</A>
|
|
<UL>
|
|
<LI><A HREF="#toc36">The golden rule of functional programming</A>
|
|
<LI><A HREF="#toc37">Operation definitions</A>
|
|
<LI><A HREF="#toc38">The ``resource`` module type</A>
|
|
<LI><A HREF="#toc39">Opening a resource</A>
|
|
<LI><A HREF="#toc40">Partial application</A>
|
|
<LI><A HREF="#toc41">Testing resource modules</A>
|
|
<LI><A HREF="#toc42">Division of labour</A>
|
|
</UL>
|
|
<LI><A HREF="#toc43">Morphology</A>
|
|
<UL>
|
|
<LI><A HREF="#toc44">Parameters and tables</A>
|
|
<LI><A HREF="#toc45">Inflection tables and paradigms</A>
|
|
<LI><A HREF="#toc46">Worst-case functions and data abstraction</A>
|
|
<LI><A HREF="#toc47">A system of paradigms using Prelude operations</A>
|
|
<LI><A HREF="#toc48">Pattern matching</A>
|
|
<LI><A HREF="#toc49">An intelligent noun paradigm using pattern matching</A>
|
|
<LI><A HREF="#toc50">Morphological resource modules</A>
|
|
</UL>
|
|
<LI><A HREF="#toc51">Using parameters in concrete syntax</A>
|
|
<UL>
|
|
<LI><A HREF="#toc52">Parametric vs. inherent features, agreement</A>
|
|
<LI><A HREF="#toc53">English concrete syntax with parameters</A>
|
|
<LI><A HREF="#toc54">Hierarchic parameter types</A>
|
|
<LI><A HREF="#toc55">Morphological analysis and morphology quiz</A>
|
|
<LI><A HREF="#toc56">Discontinuous constituents</A>
|
|
<LI><A HREF="#toc57">Free variation</A>
|
|
<LI><A HREF="#toc58">Overloading of operations</A>
|
|
</UL>
|
|
<LI><A HREF="#toc59">More constructs for concrete syntax</A>
|
|
<UL>
|
|
<LI><A HREF="#toc60">Local definitions</A>
|
|
<LI><A HREF="#toc61">Record extension and subtyping</A>
|
|
<LI><A HREF="#toc62">Tuples and product types</A>
|
|
<LI><A HREF="#toc63">Record and tuple patterns</A>
|
|
<LI><A HREF="#toc64">Regular expression patterns</A>
|
|
<LI><A HREF="#toc65">Prefix-dependent choices</A>
|
|
<LI><A HREF="#toc66">Predefined types</A>
|
|
</UL>
|
|
<LI><A HREF="#toc67">Using the resource grammar library</A>
|
|
<UL>
|
|
<LI><A HREF="#toc68">The coverage of the library</A>
|
|
<LI><A HREF="#toc69">The resource API</A>
|
|
<LI><A HREF="#toc70">Example: French</A>
|
|
<LI><A HREF="#toc71">Functor implementation of multilingual grammars</A>
|
|
<LI><A HREF="#toc72">Interfaces and instances</A>
|
|
<LI><A HREF="#toc73">Adding languages to a functor implementation</A>
|
|
<LI><A HREF="#toc74">Division of labour revisited</A>
|
|
<LI><A HREF="#toc75">Restricted inheritance</A>
|
|
<LI><A HREF="#toc76">Browsing the resource with GF commands</A>
|
|
</UL>
|
|
<LI><A HREF="#toc77">More concepts of abstract syntax</A>
|
|
<UL>
|
|
<LI><A HREF="#toc78">GF as a logical framework</A>
|
|
<LI><A HREF="#toc79">Dependent types</A>
|
|
<LI><A HREF="#toc80">Polymorphism</A>
|
|
<LI><A HREF="#toc81">Dependent types and spoken language models</A>
|
|
<UL>
|
|
<LI><A HREF="#toc82">Grammar-based language models</A>
|
|
<LI><A HREF="#toc83">Statistical language models</A>
|
|
</UL>
|
|
<LI><A HREF="#toc84">Digression: dependent types in concrete syntax</A>
|
|
<UL>
|
|
<LI><A HREF="#toc85">Variables in function types</A>
|
|
<LI><A HREF="#toc86">Polymorphism in concrete syntax</A>
|
|
</UL>
|
|
<LI><A HREF="#toc87">Proof objects</A>
|
|
<UL>
|
|
<LI><A HREF="#toc88">Proof-carrying documents</A>
|
|
</UL>
|
|
<LI><A HREF="#toc89">Restricted polymorphism</A>
|
|
<LI><A HREF="#toc90">Variable bindings</A>
|
|
<LI><A HREF="#toc91">Semantic definitions</A>
|
|
</UL>
|
|
<LI><A HREF="#toc92">Practical issues</A>
|
|
<UL>
|
|
<LI><A HREF="#toc93">Lexers and unlexers</A>
|
|
<LI><A HREF="#toc94">Speech input and output</A>
|
|
<LI><A HREF="#toc95">Multilingual syntax editor</A>
|
|
<LI><A HREF="#toc96">Communicating with GF</A>
|
|
</UL>
|
|
<LI><A HREF="#toc97">Embedded grammars in Haskell and Java</A>
|
|
<UL>
|
|
<LI><A HREF="#toc98">Writing GF grammars</A>
|
|
<UL>
|
|
<LI><A HREF="#toc99">Creating the first grammar</A>
|
|
<LI><A HREF="#toc100">Testing</A>
|
|
<LI><A HREF="#toc101">Adding a new language</A>
|
|
<LI><A HREF="#toc102">Extending the language</A>
|
|
</UL>
|
|
<LI><A HREF="#toc103">Building a user program</A>
|
|
<UL>
|
|
<LI><A HREF="#toc104">Producing a compiled grammar package</A>
|
|
<LI><A HREF="#toc105">Writing the Haskell application</A>
|
|
<LI><A HREF="#toc106">Compiling the Haskell grammar</A>
|
|
<LI><A HREF="#toc107">Building a distribution</A>
|
|
<LI><A HREF="#toc108">Using a Makefile</A>
|
|
</UL>
|
|
</UL>
|
|
<LI><A HREF="#toc109">Embedded grammars in Java</A>
|
|
<LI><A HREF="#toc110">Further reading</A>
|
|
</UL>
|
|
|
|
<P></P>
|
|
<HR NOSHADE SIZE=1>
|
|
<P></P>
|
|
<P>
|
|
<IMG ALIGN="middle" SRC="../gf-logo.png" BORDER="0" ALT="">
|
|
</P>
|
|
<A NAME="toc1"></A>
|
|
<H1>Introduction</H1>
|
|
<A NAME="toc2"></A>
|
|
<H2>GF = Grammatical Framework</H2>
|
|
<P>
|
|
The term GF is used for different things:
|
|
</P>
|
|
<UL>
|
|
<LI>a <B>program</B> used for working with grammars
|
|
<LI>a <B>programming language</B> in which grammars can be written
|
|
<LI>a <B>theory</B> about grammars and languages
|
|
</UL>
|
|
|
|
<P>
|
|
This tutorial is primarily about the GF program and
|
|
the GF programming language.
|
|
It will guide you
|
|
</P>
|
|
<UL>
|
|
<LI>to use the GF program
|
|
<LI>to write GF grammars
|
|
<LI>to write programs in which GF grammars are used as components
|
|
</UL>
|
|
|
|
<A NAME="toc3"></A>
|
|
<H2>What are GF grammars used for</H2>
|
|
<P>
|
|
A grammar is a definition of a language.
|
|
From this definition, different language processing components
|
|
can be derived:
|
|
</P>
|
|
<UL>
|
|
<LI><B>parsing</B>: to analyse the language
|
|
<LI><B>linearization</B>: to generate the language
|
|
<LI><B>translation</B>: to analyse one language and generate another
|
|
</UL>
|
|
|
|
<P>
|
|
A GF grammar can be seen as a declarative program from which these
|
|
processing tasks can be automatically derived. In addition, many
|
|
other tasks are readily available for GF grammars:
|
|
</P>
|
|
<UL>
|
|
<LI><B>morphological analysis</B>: find out the possible inflection forms of words
|
|
<LI><B>morphological synthesis</B>: generate all inflection forms of words
|
|
<LI><B>random generation</B>: generate random expressions
|
|
<LI><B>corpus generation</B>: generate all expressions
|
|
<LI><B>treebank generation</B>: generate a list of trees with multiple linearizations
|
|
<LI><B>teaching quizzes</B>: train morphology and translation
|
|
<LI><B>multilingual authoring</B>: create a document in many languages simultaneously
|
|
<LI><B>speech input</B>: optimize a speech recognition system for your grammar
|
|
</UL>
|
|
|
|
<P>
|
|
A typical GF application is based on a <B>multilingual grammar</B> involving
|
|
translation on a special domain. Existing applications of this idea include
|
|
</P>
|
|
<UL>
|
|
<LI><A HREF="http://www.cs.chalmers.se/~hallgren/Alfa/Tutorial/GFplugin.html">Alfa:</A>:
|
|
a natural-language interface to a proof editor
|
|
(languages: English, French, Swedish)
|
|
<LI><A HREF="http://www.key-project.org/">KeY</A>:
|
|
a multilingual authoring system for creating software specifications
|
|
(languages: OCL, English, German)
|
|
<LI><A HREF="http://www.talk-project.org">TALK</A>:
|
|
multilingual and multimodal dialogue systems
|
|
(languages: English, Finnish, French, German, Italian, Spanish, Swedish)
|
|
<LI><A HREF="http://webalt.math.helsinki.fi/content/index_eng.html">WebALT</A>:
|
|
a multilingual translator of mathematical exercises
|
|
(languages: Catalan, English, Finnish, French, Spanish, Swedish)
|
|
<LI><A HREF="http://www.cs.chalmers.se/~bringert/gf/translate/">Numeral translator</A>:
|
|
number words from 1 to 999,999
|
|
(88 languages)
|
|
</UL>
|
|
|
|
<P>
|
|
The specialization of a grammar to a domain makes it possible to
|
|
obtain much better translations than in an unlimited machine translation
|
|
system. This is due to the well-defined semantics of such domains.
|
|
Grammars having this character are called <B>application grammars</B>.
|
|
They are different from most grammars written by linguists just
|
|
because they are multilingual and domain-specific.
|
|
</P>
|
|
<P>
|
|
However, there is another kind of grammars, which we call <B>resource grammars</B>.
|
|
These are large, comprehensive grammars that can be used on any domain.
|
|
The GF Resource Grammar Library has resource grammars for 10 languages.
|
|
These grammars can be used as <B>libraries</B> to define application grammars.
|
|
In this way, it is possible to write a high-quality grammar without
|
|
knowing about linguistics: in general, to write an application grammar
|
|
by using the resource library just requires practical knowledge of
|
|
the target language. and all theoretical knowledge about its grammar
|
|
is given by the libraries.
|
|
</P>
|
|
<A NAME="toc4"></A>
|
|
<H2>Who is this tutorial for</H2>
|
|
<P>
|
|
This tutorial is mainly for programmers who want to learn to write
|
|
application grammars. It will go through GF's programming concepts
|
|
without entering too deep into linguistics. Thus it should
|
|
be accessible to anyone who has some previous programming experience.
|
|
</P>
|
|
<P>
|
|
A separate document has been written on how to write resource grammars: the
|
|
<A HREF="../../lib/resource-1.0/doc/Resource-HOWTO.html">Resource HOWTO</A>.
|
|
In this tutorial, we will just cover the programming concepts that are used for
|
|
solving linguistic problems in the resource grammars.
|
|
</P>
|
|
<P>
|
|
The easiest way to use GF is probably via the interactive syntax editor.
|
|
Its use does not require any knowledge of the GF formalism. There is
|
|
a separate
|
|
<A HREF="http://www.cs.chalmers.se/~aarne/GF2.0/doc/javaGUImanual/javaGUImanual.htm">Editor User Manual</A>
|
|
by Janna Khegai, covering the use of the editor. The editor is also a platform for many
|
|
kinds of GF applications, implementing the slogan
|
|
</P>
|
|
<P>
|
|
<I>write a document in a language you don't know, while seeing it in a language you know</I>.
|
|
</P>
|
|
<A NAME="toc5"></A>
|
|
<H2>The coverage of the tutorial</H2>
|
|
<P>
|
|
The tutorial gives a hands-on introduction to grammar writing.
|
|
We start by building a small grammar for the domain of food:
|
|
in this grammar, you can say things like
|
|
</P>
|
|
<PRE>
|
|
this Italian cheese is delicious
|
|
</PRE>
|
|
<P>
|
|
in English and Italian.
|
|
</P>
|
|
<P>
|
|
The first English grammar
|
|
<A HREF="food.cf"><CODE>food.cf</CODE></A>
|
|
is written in a context-free
|
|
notation (also known as BNF). The BNF format is often a good
|
|
starting point for GF grammar development, because it is
|
|
simple and widely used. However, the BNF format is not
|
|
good for multilingual grammars. While it is possible to
|
|
"translate" by just changing the words contained in a
|
|
BNF grammar to words of some other
|
|
language, proper translation usually involves more.
|
|
For instance, the order of words may have to be changed:
|
|
</P>
|
|
<PRE>
|
|
Italian cheese ===> formaggio italiano
|
|
</PRE>
|
|
<P>
|
|
The full GF grammar format is designed to support such
|
|
changes, by separating between the <B>abstract syntax</B>
|
|
(the logical structure) and the <B>concrete syntax</B> (the
|
|
sequence of words) of expressions.
|
|
</P>
|
|
<P>
|
|
There is more than words and word order that makes languages
|
|
different. Words can have different forms, and which forms
|
|
they have vary from language to language. For instance,
|
|
Italian adjectives usually have four forms where English
|
|
has just one:
|
|
</P>
|
|
<PRE>
|
|
delicious (wine, wines, pizza, pizzas)
|
|
vino delizioso, vini deliziosi, pizza deliziosa, pizze deliziose
|
|
</PRE>
|
|
<P>
|
|
The <B>morphology</B> of a language describes the
|
|
forms of its words. While the complete description of morphology
|
|
belongs to resource grammars, this tutorial will explain the
|
|
programming concepts involved in morphology. This will moreover
|
|
make it possible to grow the fragment covered by the food example.
|
|
The tutorial will in fact build a miniature resource grammar in order
|
|
to give an introduction to linguistically oriented grammar writing.
|
|
</P>
|
|
<P>
|
|
Thus it is by elaborating the initial <CODE>food.cf</CODE> example that
|
|
the tutorial makes a guided tour through all concepts of GF.
|
|
While the constructs of the GF language are the main focus,
|
|
also the commands of the GF system are introduced as they
|
|
are needed.
|
|
</P>
|
|
<P>
|
|
To learn how to write GF grammars is not the only goal of
|
|
this tutorial. We will also explain the most important
|
|
commands of the GF system. With these commands,
|
|
simple applications of grammars, such as translation and
|
|
quiz systems, can be built simply by writing scripts for the
|
|
system.
|
|
</P>
|
|
<P>
|
|
More complicated applications, such as natural-language
|
|
interfaces and dialogue systems, moreover require programming in
|
|
some general-purpose language. Thus we will briefly explain how
|
|
GF grammars are used as components of Haskell programs.
|
|
Chapters on using them in Java and Javascript programs are
|
|
forthcoming; a comprehensive manual on GF embedded in Java, by Björn Bringert, is
|
|
available in
|
|
<A HREF="http://www.cs.chalmers.se/~bringert/gf/gf-java.html"><CODE>http://www.cs.chalmers.se/~bringert/gf/gf-java.html</CODE></A>.
|
|
</P>
|
|
<A NAME="toc6"></A>
|
|
<H2>Getting the GF program</H2>
|
|
<P>
|
|
The GF program is open-source free software, which you can download via the
|
|
GF Homepage:
|
|
</P>
|
|
<P>
|
|
<A HREF="http://www.cs.chalmers.se/~aarne/GF"><CODE>http://www.cs.chalmers.se/~aarne/GF</CODE></A>
|
|
</P>
|
|
<P>
|
|
There you can download
|
|
</P>
|
|
<UL>
|
|
<LI>binaries for Linux, Mac OS X, and Windows
|
|
<LI>source code and documentation
|
|
<LI>grammar libraries and examples
|
|
</UL>
|
|
|
|
<P>
|
|
If you want to compile GF from source, you need a Haskell compiler.
|
|
To compile the interactive editor, you also need a Java compilers.
|
|
But normally you don't have to compile, and you definitely
|
|
don't need to know Haskell or Java to use GF.
|
|
</P>
|
|
<P>
|
|
We are assuming the availability of a Unix shell. Linux and Mac OS X users
|
|
have it automatically, the latter under the name "terminal".
|
|
Windows users are recommended to install Cywgin, the free Unix shell for Windows.
|
|
</P>
|
|
<A NAME="toc7"></A>
|
|
<H2>Running the GF program</H2>
|
|
<P>
|
|
To start the GF program, assuming you have installed it, just type
|
|
</P>
|
|
<PRE>
|
|
% gf
|
|
</PRE>
|
|
<P>
|
|
in the shell.
|
|
You will see GF's welcome message and the prompt <CODE>></CODE>.
|
|
The command
|
|
</P>
|
|
<PRE>
|
|
> help
|
|
</PRE>
|
|
<P>
|
|
will give you a list of available commands.
|
|
</P>
|
|
<P>
|
|
As a common convention in this Tutorial, we will use
|
|
</P>
|
|
<UL>
|
|
<LI><CODE>%</CODE> as a prompt that marks system commands
|
|
<LI><CODE>></CODE> as a prompt that marks GF commands
|
|
</UL>
|
|
|
|
<P>
|
|
Thus you should not type these prompts, but only the lines that
|
|
follow them.
|
|
</P>
|
|
<A NAME="toc8"></A>
|
|
<H1>The .cf grammar format</H1>
|
|
<P>
|
|
Now you are ready to try out your first grammar.
|
|
We start with one that is not written in the GF language, but
|
|
in the much more common BNF notation (Backus Naur Form). The GF
|
|
program understands a variant of this notation and translates it
|
|
internally to GF's own representation.
|
|
</P>
|
|
<P>
|
|
To get started, type (or copy) the following lines into a file named
|
|
<CODE>food.cf</CODE>:
|
|
</P>
|
|
<PRE>
|
|
Is. S ::= Item "is" Quality ;
|
|
That. Item ::= "that" Kind ;
|
|
This. Item ::= "this" Kind ;
|
|
QKind. Kind ::= Quality Kind ;
|
|
Cheese. Kind ::= "cheese" ;
|
|
Fish. Kind ::= "fish" ;
|
|
Wine. Kind ::= "wine" ;
|
|
Italian. Quality ::= "Italian" ;
|
|
Boring. Quality ::= "boring" ;
|
|
Delicious. Quality ::= "delicious" ;
|
|
Expensive. Quality ::= "expensive" ;
|
|
Fresh. Quality ::= "fresh" ;
|
|
Very. Quality ::= "very" Quality ;
|
|
Warm. Quality ::= "warm" ;
|
|
</PRE>
|
|
<P>
|
|
For those who know ordinary BNF, the
|
|
notation we use includes one extra element: a <B>label</B> appearing
|
|
as the first element of each rule and terminated by a full stop.
|
|
</P>
|
|
<P>
|
|
The grammar we wrote defines a set of phrases usable for speaking about food.
|
|
It builds <B>sentences</B> (<CODE>S</CODE>) by assigning <CODE>Quality</CODE>s to
|
|
<CODE>Item</CODE>s. <CODE>Item</CODE>s are build from <CODE>Kind</CODE>s by prepending the
|
|
word "this" or "that". <CODE>Kind</CODE>s are either <B>atomic</B>, such as
|
|
"cheese" and "wine", or formed by prepending a <CODE>Quality</CODE> to a
|
|
<CODE>Kind</CODE>. A <CODE>Quality</CODE> is either atomic, such as "Italian" and "boring",
|
|
or built by another <CODE>Quality</CODE> by prepending "very". Those familiar with
|
|
the context-free grammar notation will notice that, for instance, the
|
|
following sentence can be built using this grammar:
|
|
</P>
|
|
<PRE>
|
|
this delicious Italian wine is very very expensive
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc9"></A>
|
|
<H2>Importing grammars and parsing strings</H2>
|
|
<P>
|
|
The first GF command needed when using a grammar is to <B>import</B> it.
|
|
The command has a long name, <CODE>import</CODE>, and a short name, <CODE>i</CODE>.
|
|
You can type either
|
|
</P>
|
|
<PRE>
|
|
> import food.cf
|
|
</PRE>
|
|
<P>
|
|
or
|
|
</P>
|
|
<PRE>
|
|
> i food.cf
|
|
</PRE>
|
|
<P>
|
|
to get the same effect.
|
|
The effect is that the GF program <B>compiles</B> your grammar into an internal
|
|
representation, and shows a new prompt when it is ready. It will also show how much
|
|
CPU time is consumed:
|
|
</P>
|
|
<PRE>
|
|
> i food.cf
|
|
- parsing cf food.cf 12 msec
|
|
16 msec
|
|
>
|
|
</PRE>
|
|
<P>
|
|
You can now use GF for <B>parsing</B>:
|
|
</P>
|
|
<PRE>
|
|
> parse "this cheese is delicious"
|
|
Is (This Cheese) Delicious
|
|
|
|
> p "that wine is very very Italian"
|
|
Is (That Wine) (Very (Very Italian))
|
|
</PRE>
|
|
<P>
|
|
The <CODE>parse</CODE> (= <CODE>p</CODE>) command takes a <B>string</B>
|
|
(in double quotes) and returns an <B>abstract syntax tree</B> - the thing
|
|
beginning with <CODE>Is</CODE>. Trees are built from the rule labels given in the
|
|
grammar, and record the ways in which the rules are used to produce the
|
|
strings. A tree is, in general, something easier than a string
|
|
for a machine to understand and to process further.
|
|
</P>
|
|
<P>
|
|
Strings that return a tree when parsed do so in virtue of the grammar
|
|
you imported. Try parsing something else, and you fail
|
|
</P>
|
|
<PRE>
|
|
> p "hello world"
|
|
Unknown words: hello world
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<B>Exercise</B>. Extend the grammar <CODE>food.cf</CODE> by ten new food kinds and
|
|
qualities, and run the parser with new kinds of examples.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Add a rule that enables questions of the form
|
|
<I>is this cheese Italian</I>.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Add the rule
|
|
</P>
|
|
<PRE>
|
|
IsVery. S ::= Item "is" "very" Quality ;
|
|
</PRE>
|
|
<P>
|
|
and see what happens when parsing <CODE>this wine is very very Italian</CODE>.
|
|
You have just made the grammar <B>ambiguous</B>: it now assigns several
|
|
trees to some strings.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Modify the grammar so that at most one <CODE>Quality</CODE> may
|
|
attach to a given <CODE>Kind</CODE>. Thus <I>boring Italian fish</I> will no longer
|
|
be recognized.
|
|
</P>
|
|
<A NAME="toc10"></A>
|
|
<H2>Generating trees and strings</H2>
|
|
<P>
|
|
You can also use GF for <B>linearizing</B>
|
|
(<CODE>linearize = l</CODE>). This is the inverse of
|
|
parsing, taking trees into strings:
|
|
</P>
|
|
<PRE>
|
|
> linearize Is (That Wine) Warm
|
|
that wine is warm
|
|
</PRE>
|
|
<P>
|
|
What is the use of this? Typically not that you type in a tree at
|
|
the GF prompt. The utility of linearization comes from the fact that
|
|
you can obtain a tree from somewhere else. One way to do so is
|
|
<B>random generation</B> (<CODE>generate_random = gr</CODE>):
|
|
</P>
|
|
<PRE>
|
|
> generate_random
|
|
Is (This (QKind Italian Fish)) Fresh
|
|
</PRE>
|
|
<P>
|
|
Now you can copy the tree and paste it to the <CODE>linearize command</CODE>.
|
|
Or, more conveniently, feed random generation into linearization by using
|
|
a <B>pipe</B>.
|
|
</P>
|
|
<PRE>
|
|
> gr | l
|
|
this Italian fish is fresh
|
|
</PRE>
|
|
<P>
|
|
Pipes in GF work much the same way as Unix pipes: they feed the output
|
|
of one command into another command as its input.
|
|
</P>
|
|
<A NAME="toc11"></A>
|
|
<H2>Visualizing trees</H2>
|
|
<P>
|
|
The gibberish code with parentheses returned by the parser does not
|
|
look like trees. Why is it called so? From the abstract mathematical
|
|
point of view, trees are a data structure that
|
|
represents <B>nesting</B>: trees are branching entities, and the branches
|
|
are themselves trees. Parentheses give a linear representation of trees,
|
|
useful for the computer. But the human eye may prefer to see a visualization;
|
|
for this purpose, GF provides the command <CODE>visualizre_tree = vt</CODE>, to which
|
|
parsing (and any other tree-producing command) can be piped:
|
|
</P>
|
|
<PRE>
|
|
> parse "this delicious cheese is very Italian" | vt
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<IMG ALIGN="middle" SRC="Tree2.png" BORDER="0" ALT="">
|
|
</P>
|
|
<P>
|
|
This command uses the programs Graphviz and Ghostview, which you
|
|
might not have, but which are freely available on the web.
|
|
</P>
|
|
<A NAME="toc12"></A>
|
|
<H2>Some random-generated sentences</H2>
|
|
<P>
|
|
Random generation is a good way to test a grammar; it can also
|
|
be fun. So you may want to
|
|
generate ten strings with one and the same command:
|
|
</P>
|
|
<PRE>
|
|
> gr -number=10 | l
|
|
that wine is boring
|
|
that fresh cheese is fresh
|
|
that cheese is very boring
|
|
this cheese is Italian
|
|
that expensive cheese is expensive
|
|
that fish is fresh
|
|
that wine is very Italian
|
|
this wine is Italian
|
|
this cheese is boring
|
|
this fish is boring
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc13"></A>
|
|
<H2>Systematic generation</H2>
|
|
<P>
|
|
To generate <I>all</I> sentence that a grammar
|
|
can generate, use the command <CODE>generate_trees = gt</CODE>.
|
|
</P>
|
|
<PRE>
|
|
> generate_trees | l
|
|
that cheese is very Italian
|
|
that cheese is very boring
|
|
that cheese is very delicious
|
|
that cheese is very expensive
|
|
that cheese is very fresh
|
|
...
|
|
this wine is expensive
|
|
this wine is fresh
|
|
this wine is warm
|
|
|
|
</PRE>
|
|
<P>
|
|
You get quite a few trees but not all of them: only up to a given
|
|
<B>depth</B> of trees. To see how you can get more, use the
|
|
<CODE>help = h</CODE> command,
|
|
</P>
|
|
<PRE>
|
|
> help gt
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<B>Exercise</B>. If the command <CODE>gt</CODE> generated all
|
|
trees in your grammar, it would never terminate. Why?
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Measure how many trees the grammar gives with depths 4 and 5,
|
|
respectively. You use the Unix <B>word count</B> command <CODE>wc</CODE> to count lines.
|
|
<B>Hint</B>. You can pipe the output of a GF command into a Unix command by
|
|
using the escape <CODE>?</CODE>, as follows:
|
|
</P>
|
|
<PRE>
|
|
> generate_trees | ? wc
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc14"></A>
|
|
<H2>More on pipes; tracing</H2>
|
|
<P>
|
|
A pipe of GF commands can have any length, but the "output type"
|
|
(either string or tree) of one command must always match the "input type"
|
|
of the next command.
|
|
</P>
|
|
<P>
|
|
The intermediate results in a pipe can be observed by putting the
|
|
<B>tracing</B> flag <CODE>-tr</CODE> to each command whose output you
|
|
want to see:
|
|
</P>
|
|
<PRE>
|
|
> gr -tr | l -tr | p
|
|
|
|
Is (This Cheese) Boring
|
|
this cheese is boring
|
|
Is (This Cheese) Boring
|
|
</PRE>
|
|
<P>
|
|
This facility is good for test purposes: for instance, you
|
|
may want to see if a grammar is <B>ambiguous</B>, i.e.
|
|
contains strings that can be parsed in more than one way.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Extend the grammar <CODE>food.cf</CODE> so that it produces ambiguous strings,
|
|
and try out the ambiguity test.
|
|
</P>
|
|
<A NAME="toc15"></A>
|
|
<H2>Writing and reading files</H2>
|
|
<P>
|
|
To save the outputs of GF commands into a file, you can
|
|
pipe it to the <CODE>write_file = wf</CODE> command,
|
|
</P>
|
|
<PRE>
|
|
> gr -number=10 | l | write_file exx.tmp
|
|
</PRE>
|
|
<P>
|
|
You can read the file back to GF with the
|
|
<CODE>read_file = rf</CODE> command,
|
|
</P>
|
|
<PRE>
|
|
> read_file exx.tmp | p -lines
|
|
</PRE>
|
|
<P>
|
|
Notice the flag <CODE>-lines</CODE> given to the parsing
|
|
command. This flag tells GF to parse each line of
|
|
the file separately. Without the flag, the grammar could
|
|
not recognize the string in the file, because it is not
|
|
a sentence but a sequence of ten sentences.
|
|
</P>
|
|
<A NAME="toc16"></A>
|
|
<H1>The .gf grammar format</H1>
|
|
<P>
|
|
To see GF's internal representation of a grammar
|
|
that you have imported, you can give the command
|
|
<CODE>print_grammar = pg</CODE>,
|
|
</P>
|
|
<PRE>
|
|
> print_grammar
|
|
</PRE>
|
|
<P>
|
|
The output is quite unreadable at this stage, and you may feel happy that
|
|
you did not need to write the grammar in that notation, but that the
|
|
GF grammar compiler produced it.
|
|
</P>
|
|
<P>
|
|
However, we will now start the demonstration
|
|
how GF's own notation gives you
|
|
much more expressive power than the <CODE>.cf</CODE>
|
|
format. We will introduce the <CODE>.gf</CODE> format by presenting
|
|
another way of defining the same grammar as in
|
|
<CODE>food.cf</CODE>.
|
|
Then we will show how the full GF grammar format enables you
|
|
to do things that are not possible in the context-free format.
|
|
</P>
|
|
<A NAME="toc17"></A>
|
|
<H2>Abstract and concrete syntax</H2>
|
|
<P>
|
|
A GF grammar consists of two main parts:
|
|
</P>
|
|
<UL>
|
|
<LI><B>abstract syntax</B>, defining what syntax trees there are
|
|
<LI><B>concrete syntax</B>, defining how trees are linearized into strings
|
|
</UL>
|
|
|
|
<P>
|
|
The context-free format fuses these two things together, but it is always
|
|
possible to take them apart. For instance, the sentence formation rule
|
|
</P>
|
|
<PRE>
|
|
Is. S ::= Item "is" Quality ;
|
|
</PRE>
|
|
<P>
|
|
is interpreted as the following pair of GF rules:
|
|
</P>
|
|
<PRE>
|
|
fun Is : Item -> Quality -> S ;
|
|
lin Is item quality = {s = item.s ++ "is" ++ quality.s} ;
|
|
</PRE>
|
|
<P>
|
|
The former rule, with the keyword <CODE>fun</CODE>, belongs to the abstract syntax.
|
|
It defines the <B>function</B>
|
|
<CODE>Is</CODE> which constructs syntax trees of form
|
|
(<CODE>Is</CODE> <I>item</I> <I>quality</I>).
|
|
</P>
|
|
<P>
|
|
The latter rule, with the keyword <CODE>lin</CODE>, belongs to the concrete syntax.
|
|
It defines the <B>linearization function</B> for
|
|
syntax trees of form (<CODE>Is</CODE> <I>item</I> <I>quality</I>).
|
|
</P>
|
|
<A NAME="toc18"></A>
|
|
<H2>Judgement forms</H2>
|
|
<P>
|
|
Rules in a GF grammar are called <B>judgements</B>, and the keywords
|
|
<CODE>fun</CODE> and <CODE>lin</CODE> are used for distinguishing between two
|
|
<B>judgement forms</B>. Here is a summary of the most important
|
|
judgement forms:
|
|
</P>
|
|
<UL>
|
|
<LI>abstract syntax
|
|
<P></P>
|
|
</UL>
|
|
|
|
<TABLE ALIGN="center" CELLPADDING="4" BORDER="1">
|
|
<TR>
|
|
<TD>form</TD>
|
|
<TD>reading</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>cat</CODE> C</TD>
|
|
<TD>C is a category</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>fun</CODE> f <CODE>:</CODE> A</TD>
|
|
<TD>f is a function of type A</TD>
|
|
</TR>
|
|
</TABLE>
|
|
|
|
<P></P>
|
|
<UL>
|
|
<LI>concrete syntax
|
|
<P></P>
|
|
</UL>
|
|
|
|
<TABLE ALIGN="center" CELLPADDING="4" BORDER="1">
|
|
<TR>
|
|
<TD>form</TD>
|
|
<TD>reading</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>lincat</CODE> C <CODE>=</CODE> T</TD>
|
|
<TD>category C has linearization type T</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>lin</CODE> f <CODE>=</CODE> t</TD>
|
|
<TD>function f has linearization t</TD>
|
|
</TR>
|
|
</TABLE>
|
|
|
|
<P></P>
|
|
<P>
|
|
We return to the precise meanings of these judgement forms later.
|
|
First we will look at how judgements are grouped into modules, and
|
|
show how the food grammar is
|
|
expressed by using modules and judgements.
|
|
</P>
|
|
<A NAME="toc19"></A>
|
|
<H2>Module types</H2>
|
|
<P>
|
|
A GF grammar consists of <B>modules</B>,
|
|
into which judgements are grouped. The most important
|
|
module forms are
|
|
</P>
|
|
<UL>
|
|
<LI><CODE>abstract</CODE> A <CODE>=</CODE> M, abstract syntax A with judgements in
|
|
the module body M.
|
|
<LI><CODE>concrete</CODE> C <CODE>of</CODE> A <CODE>=</CODE> M, concrete syntax C of the
|
|
abstract syntax A, with judgements in the module body M.
|
|
</UL>
|
|
|
|
<A NAME="toc20"></A>
|
|
<H2>Basic types and function types</H2>
|
|
<P>
|
|
The nonterminals of a context-free grammar, i.e. categories,
|
|
are called <B>basic types</B> in the type system of GF. In addition
|
|
to them, there are <B>function types</B> such as
|
|
</P>
|
|
<PRE>
|
|
Item -> Quality -> S
|
|
</PRE>
|
|
<P>
|
|
This type is read "a function from iterms and qualities to sentences".
|
|
The last type in the arrow-separated sequence is the <B>value type</B>
|
|
of the function type, the earlier types are its <B>argument types</B>.
|
|
</P>
|
|
<A NAME="toc21"></A>
|
|
<H2>Records and strings</H2>
|
|
<P>
|
|
The linearization type of a category is a <B>record type</B>, with
|
|
zero of more <B>fields</B> of different types. The simplest record
|
|
type used for linearization in GF is
|
|
</P>
|
|
<PRE>
|
|
{s : Str}
|
|
</PRE>
|
|
<P>
|
|
which has one field, with <B>label</B> <CODE>s</CODE> and type <CODE>Str</CODE>.
|
|
</P>
|
|
<P>
|
|
Examples of records of this type are
|
|
</P>
|
|
<PRE>
|
|
{s = "foo"}
|
|
{s = "hello" ++ "world"}
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
Whenever a record <CODE>r</CODE> of type <CODE>{s : Str}</CODE> is given,
|
|
<CODE>r.s</CODE> is an object of type <CODE>Str</CODE>. This is
|
|
a special case of the <B>projection</B> rule, allowing the extraction
|
|
of fields from a record:
|
|
</P>
|
|
<UL>
|
|
<LI>if <I>r</I> : <CODE>{</CODE> ... <I>p</I> : <I>T</I> ... <CODE>}</CODE> then <I>r.p</I> : <I>T</I>
|
|
</UL>
|
|
|
|
<P>
|
|
The type <CODE>Str</CODE> is really the type of <B>token lists</B>, but
|
|
most of the time one can conveniently think of it as the type of strings,
|
|
denoted by string literals in double quotes.
|
|
</P>
|
|
<P>
|
|
Notice that
|
|
</P>
|
|
<PRE>
|
|
"hello world"
|
|
</PRE>
|
|
<P>
|
|
is not recommended as an expression of type <CODE>Str</CODE>. It denotes
|
|
a token with a space in it, and will usually
|
|
not work with the lexical analysis that precedes parsing. A shorthand
|
|
exemplified by
|
|
</P>
|
|
<PRE>
|
|
["hello world and people"] === "hello" ++ "world" ++ "and" ++ "people"
|
|
</PRE>
|
|
<P>
|
|
can be used for lists of tokens. The expression
|
|
</P>
|
|
<PRE>
|
|
[]
|
|
</PRE>
|
|
<P>
|
|
denotes the empty token list.
|
|
</P>
|
|
<A NAME="toc22"></A>
|
|
<H2>An abstract syntax example</H2>
|
|
<P>
|
|
To express the abstract syntax of <CODE>food.cf</CODE> in
|
|
a file <CODE>Food.gf</CODE>, we write two kinds of judgements:
|
|
</P>
|
|
<UL>
|
|
<LI>Each category is introduced by a <CODE>cat</CODE> judgement.
|
|
<LI>Each rule label is introduced by a <CODE>fun</CODE> judgement,
|
|
with the type formed from the nonterminals of the rule.
|
|
</UL>
|
|
|
|
<PRE>
|
|
abstract Food = {
|
|
|
|
cat
|
|
S ; Item ; Kind ; Quality ;
|
|
|
|
fun
|
|
Is : Item -> Quality -> S ;
|
|
This, That : Kind -> Item ;
|
|
QKind : Quality -> Kind -> Kind ;
|
|
Wine, Cheese, Fish : Kind ;
|
|
Very : Quality -> Quality ;
|
|
Fresh, Warm, Italian, Expensive, Delicious, Boring : Quality ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
Notice the use of shorthands permitting the sharing of
|
|
the keyword in subsequent judgements,
|
|
</P>
|
|
<PRE>
|
|
cat S ; Item ; === cat S ; cat Item ;
|
|
</PRE>
|
|
<P>
|
|
and of the type in subsequent <CODE>fun</CODE> judgements,
|
|
</P>
|
|
<PRE>
|
|
fun Wine, Fish : Kind ; ===
|
|
fun Wine : Kind ; Fish : Kind ; ===
|
|
fun Wine : Kind ; fun Fish : Kind ;
|
|
</PRE>
|
|
<P>
|
|
The order of judgements in a module is free.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Extend the abstract syntax <CODE>Food</CODE> with ten new
|
|
kinds and qualities, and with questions of the form
|
|
<I>is this wine Italian</I>.
|
|
</P>
|
|
<A NAME="toc23"></A>
|
|
<H2>A concrete syntax example</H2>
|
|
<P>
|
|
Each category introduced in <CODE>Food.gf</CODE> is
|
|
given a <CODE>lincat</CODE> rule, and each
|
|
function is given a <CODE>lin</CODE> rule. Similar shorthands
|
|
apply as in <CODE>abstract</CODE> modules.
|
|
</P>
|
|
<PRE>
|
|
concrete FoodEng of Food = {
|
|
|
|
lincat
|
|
S, Item, Kind, Quality = {s : Str} ;
|
|
|
|
lin
|
|
Is item quality = {s = item.s ++ "is" ++ quality.s} ;
|
|
This kind = {s = "this" ++ kind.s} ;
|
|
That kind = {s = "that" ++ kind.s} ;
|
|
QKind quality kind = {s = quality.s ++ kind.s} ;
|
|
Wine = {s = "wine"} ;
|
|
Cheese = {s = "cheese"} ;
|
|
Fish = {s = "fish"} ;
|
|
Very quality = {s = "very" ++ quality.s} ;
|
|
Fresh = {s = "fresh"} ;
|
|
Warm = {s = "warm"} ;
|
|
Italian = {s = "Italian"} ;
|
|
Expensive = {s = "expensive"} ;
|
|
Delicious = {s = "delicious"} ;
|
|
Boring = {s = "boring"} ;
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<B>Exercise</B>. Extend the concrete syntax <CODE>FoodEng</CODE> so that it
|
|
matches the abstract syntax defined in the exercise of the previous
|
|
section. What happens if the concrete syntax lacks some of the
|
|
new functions?
|
|
</P>
|
|
<A NAME="toc24"></A>
|
|
<H2>Modules and files</H2>
|
|
<P>
|
|
GF uses suffixes to recognize different file formats. The most
|
|
important ones are:
|
|
</P>
|
|
<UL>
|
|
<LI>Source files: Module name + <CODE>.gf</CODE> = file name
|
|
<LI>Target files: each module is compiled into a <CODE>.gfc</CODE> file.
|
|
</UL>
|
|
|
|
<P>
|
|
Import <CODE>FoodEng.gf</CODE> and see what happens:
|
|
</P>
|
|
<PRE>
|
|
> i FoodEng.gf
|
|
- compiling Food.gf... wrote file Food.gfc 16 msec
|
|
- compiling FoodEng.gf... wrote file FoodEng.gfc 20 msec
|
|
</PRE>
|
|
<P>
|
|
The GF program does not only read the file
|
|
<CODE>FoodEng.gf</CODE>, but also all other files that it
|
|
depends on - in this case, <CODE>Food.gf</CODE>.
|
|
</P>
|
|
<P>
|
|
For each file that is compiled, a <CODE>.gfc</CODE> file
|
|
is generated. The GFC format (="GF Canonical") is the
|
|
"machine code" of GF, which is faster to process than
|
|
GF source files. When reading a module, GF decides whether
|
|
to use an existing <CODE>.gfc</CODE> file or to generate
|
|
a new one, by looking at modification times.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. What happens when you import <CODE>FoodEng.gf</CODE> for
|
|
a second time? Try this in different situations:
|
|
</P>
|
|
<UL>
|
|
<LI>Right after importing it the first time (the modules are kept in
|
|
the memory of GF and need no reloading).
|
|
<LI>After issuing the command <CODE>empty</CODE> (<CODE>e</CODE>), which clears the memory
|
|
of GF.
|
|
<LI>After making a small change in <CODE>FoodEng.gf</CODE>, be it only an added space.
|
|
<LI>After making a change in <CODE>Food.gf</CODE>.
|
|
</UL>
|
|
|
|
<A NAME="toc25"></A>
|
|
<H1>Multilingual grammars and translation</H1>
|
|
<P>
|
|
The main advantage of separating abstract from concrete syntax is that
|
|
one abstract syntax can be equipped with many concrete syntaxes.
|
|
A system with this property is called a <B>multilingual grammar</B>.
|
|
</P>
|
|
<P>
|
|
Multilingual grammars can be used for applications such as
|
|
translation. Let us build an Italian concrete syntax for
|
|
<CODE>Food</CODE> and then test the resulting
|
|
multilingual grammar.
|
|
</P>
|
|
<A NAME="toc26"></A>
|
|
<H2>An Italian concrete syntax</H2>
|
|
<PRE>
|
|
concrete FoodIta of Food = {
|
|
|
|
lincat
|
|
S, Item, Kind, Quality = {s : Str} ;
|
|
|
|
lin
|
|
Is item quality = {s = item.s ++ "è" ++ quality.s} ;
|
|
This kind = {s = "questo" ++ kind.s} ;
|
|
That kind = {s = "quello" ++ kind.s} ;
|
|
QKind quality kind = {s = kind.s ++ quality.s} ;
|
|
Wine = {s = "vino"} ;
|
|
Cheese = {s = "formaggio"} ;
|
|
Fish = {s = "pesce"} ;
|
|
Very quality = {s = "molto" ++ quality.s} ;
|
|
Fresh = {s = "fresco"} ;
|
|
Warm = {s = "caldo"} ;
|
|
Italian = {s = "italiano"} ;
|
|
Expensive = {s = "caro"} ;
|
|
Delicious = {s = "delizioso"} ;
|
|
Boring = {s = "noioso"} ;
|
|
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<B>Exercise</B>. Write a concrete syntax of <CODE>Food</CODE> for some other language.
|
|
You will probably end up with grammatically incorrect output - but don't
|
|
worry about this yet.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. If you have written <CODE>Food</CODE> for German, Swedish, or some
|
|
other language, test with random or exhaustive generation what constructs
|
|
come out incorrect, and prepare a list of those ones that cannot be helped
|
|
with the currently available fragment of GF.
|
|
</P>
|
|
<A NAME="toc27"></A>
|
|
<H2>Using a multilingual grammar</H2>
|
|
<P>
|
|
Import the two grammars in the same GF session.
|
|
</P>
|
|
<PRE>
|
|
> i FoodEng.gf
|
|
> i FoodIta.gf
|
|
</PRE>
|
|
<P>
|
|
Try generation now:
|
|
</P>
|
|
<PRE>
|
|
> gr | l
|
|
quello formaggio molto noioso è italiano
|
|
|
|
> gr | l -lang=FoodEng
|
|
this fish is warm
|
|
</PRE>
|
|
<P>
|
|
Translate by using a pipe:
|
|
</P>
|
|
<PRE>
|
|
> p -lang=FoodEng "this cheese is very delicious" | l -lang=FoodIta
|
|
questo formaggio è molto delizioso
|
|
</PRE>
|
|
<P>
|
|
Generate a <B>multilingual treebank</B>, i.e. a set of trees with their
|
|
translations in different languages:
|
|
</P>
|
|
<PRE>
|
|
> gr -number=2 | tree_bank
|
|
|
|
Is (That Cheese) (Very Boring)
|
|
quello formaggio è molto noioso
|
|
that cheese is very boring
|
|
|
|
Is (That Cheese) Fresh
|
|
quello formaggio è fresco
|
|
that cheese is fresh
|
|
</PRE>
|
|
<P>
|
|
The <CODE>lang</CODE> flag tells GF which concrete syntax to use in parsing and
|
|
linearization. By default, the flag is set to the last-imported grammar.
|
|
To see what grammars are in scope and which is the main one, use the command
|
|
<CODE>print_options = po</CODE>:
|
|
</P>
|
|
<PRE>
|
|
> print_options
|
|
main abstract : Food
|
|
main concrete : FoodIta
|
|
actual concretes : FoodIta FoodEng
|
|
</PRE>
|
|
<P>
|
|
You can change the main grammar by the command <CODE>change_main = cm</CODE>:
|
|
</P>
|
|
<PRE>
|
|
> change_main FoodEng
|
|
main abstract : Food
|
|
main concrete : FoodEng
|
|
actual concretes : FoodIta FoodEng
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc28"></A>
|
|
<H2>Translation session</H2>
|
|
<P>
|
|
If translation is what you want to do with a set of grammars, a convenient
|
|
way to do it is to open a <CODE>translation_session = ts</CODE>. In this session,
|
|
you can translate between all the languages that are in scope.
|
|
A dot <CODE>.</CODE> terminates the translation session.
|
|
</P>
|
|
<PRE>
|
|
> ts
|
|
|
|
trans> that very warm cheese is boring
|
|
quello formaggio molto caldo è noioso
|
|
that very warm cheese is boring
|
|
|
|
trans> questo vino molto italiano è molto delizioso
|
|
questo vino molto italiano è molto delizioso
|
|
this very Italian wine is very delicious
|
|
|
|
trans> .
|
|
>
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc29"></A>
|
|
<H2>Translation quiz</H2>
|
|
<P>
|
|
This is a simple language exercise that can be automatically
|
|
generated from a multilingual grammar. The system generates a set of
|
|
random sentences, displays them in one language, and checks the user's
|
|
answer given in another language. The command <CODE>translation_quiz = tq</CODE>
|
|
makes this in a subshell of GF.
|
|
</P>
|
|
<PRE>
|
|
> translation_quiz FoodEng FoodIta
|
|
|
|
Welcome to GF Translation Quiz.
|
|
The quiz is over when you have done at least 10 examples
|
|
with at least 75 % success.
|
|
You can interrupt the quiz by entering a line consisting of a dot ('.').
|
|
|
|
this fish is warm
|
|
questo pesce è caldo
|
|
> Yes.
|
|
Score 1/1
|
|
|
|
this cheese is Italian
|
|
questo formaggio è noioso
|
|
> No, not questo formaggio è noioso, but
|
|
questo formaggio è italiano
|
|
|
|
Score 1/2
|
|
this fish is expensive
|
|
</PRE>
|
|
<P>
|
|
You can also generate a list of translation exercises and save it in a
|
|
file for later use, by the command <CODE>translation_list = tl</CODE>
|
|
</P>
|
|
<PRE>
|
|
> translation_list -number=25 FoodEng FoodIta | write_file transl.txt
|
|
</PRE>
|
|
<P>
|
|
The <CODE>number</CODE> flag gives the number of sentences generated.
|
|
</P>
|
|
<A NAME="toc30"></A>
|
|
<H1>Grammar architecture</H1>
|
|
<A NAME="toc31"></A>
|
|
<H2>Extending a grammar</H2>
|
|
<P>
|
|
The module system of GF makes it possible to <B>extend</B> a
|
|
grammar in different ways. The syntax of extension is
|
|
shown by the following example. We extend <CODE>Food</CODE> by
|
|
adding a category of questions and two new functions.
|
|
</P>
|
|
<PRE>
|
|
abstract Morefood = Food ** {
|
|
cat
|
|
Question ;
|
|
fun
|
|
QIs : Item -> Quality -> Question ;
|
|
Pizza : Kind ;
|
|
|
|
}
|
|
</PRE>
|
|
<P>
|
|
Parallel to the abstract syntax, extensions can
|
|
be built for concrete syntaxes:
|
|
</P>
|
|
<PRE>
|
|
concrete MorefoodEng of Morefood = FoodEng ** {
|
|
lincat
|
|
Question = {s : Str} ;
|
|
lin
|
|
QIs item quality = {s = "is" ++ item.s ++ quality.s} ;
|
|
Pizza = {s = "pizza"} ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
The effect of extension is that all of the contents of the extended
|
|
and extending module are put together. We also say that the new
|
|
module <B>inherits</B> the contents of the old module.
|
|
</P>
|
|
<A NAME="toc32"></A>
|
|
<H2>Multiple inheritance</H2>
|
|
<P>
|
|
Specialized vocabularies can be represented as small grammars that
|
|
only do "one thing" each. For instance, the following are grammars
|
|
for fruit and mushrooms
|
|
</P>
|
|
<PRE>
|
|
abstract Fruit = {
|
|
cat Fruit ;
|
|
fun Apple, Peach : Fruit ;
|
|
}
|
|
|
|
abstract Mushroom = {
|
|
cat Mushroom ;
|
|
fun Cep, Agaric : Mushroom ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
They can afterwards be combined into bigger grammars by using
|
|
<B>multiple inheritance</B>, i.e. extension of several grammars at the
|
|
same time:
|
|
</P>
|
|
<PRE>
|
|
abstract Foodmarket = Food, Fruit, Mushroom ** {
|
|
fun
|
|
FruitKind : Fruit -> Kind ;
|
|
MushroomKind : Mushroom -> Kind ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
At this point, you would perhaps like to go back to
|
|
<CODE>Food</CODE> and take apart <CODE>Wine</CODE> to build a special
|
|
<CODE>Drink</CODE> module.
|
|
</P>
|
|
<A NAME="toc33"></A>
|
|
<H2>Visualizing module structure</H2>
|
|
<P>
|
|
When you have created all the abstract syntaxes and
|
|
one set of concrete syntaxes needed for <CODE>Foodmarket</CODE>,
|
|
your grammar consists of eight GF modules. To see how their
|
|
dependences look like, you can use the command
|
|
<CODE>visualize_graph = vg</CODE>,
|
|
</P>
|
|
<PRE>
|
|
> visualize_graph
|
|
</PRE>
|
|
<P>
|
|
and the graph will pop up in a separate window.
|
|
</P>
|
|
<P>
|
|
The graph uses
|
|
</P>
|
|
<UL>
|
|
<LI>oval boxes for abstract modules
|
|
<LI>square boxes for concrete modules
|
|
<LI>black-headed arrows for inheritance
|
|
<LI>white-headed arrows for the concrete-of-abstract relation
|
|
</UL>
|
|
|
|
<P>
|
|
<IMG ALIGN="middle" SRC="Foodmarket.png" BORDER="0" ALT="">
|
|
</P>
|
|
<P>
|
|
Just as the <CODE>visualize_tree = vt</CODE> command, the open source tools
|
|
Ghostview and Graphviz are needed.
|
|
</P>
|
|
<A NAME="toc34"></A>
|
|
<H2>System commands</H2>
|
|
<P>
|
|
To document your grammar, you may want to print the
|
|
graph into a file, e.g. a <CODE>.png</CODE> file that
|
|
can be included in an HTML document. You can do this
|
|
by first printing the graph into a file <CODE>.dot</CODE> and then
|
|
processing this file with the <CODE>dot</CODE> program (from the Graphviz package).
|
|
</P>
|
|
<PRE>
|
|
> pm -printer=graph | wf Foodmarket.dot
|
|
> ! dot -Tpng Foodmarket.dot > Foodmarket.png
|
|
</PRE>
|
|
<P>
|
|
The latter command is a Unix command, issued from GF by using the
|
|
shell escape symbol <CODE>!</CODE>. The resulting graph was shown in the previous section.
|
|
</P>
|
|
<P>
|
|
The command <CODE>print_multi = pm</CODE> is used for printing the current multilingual
|
|
grammar in various formats, of which the format <CODE>-printer=graph</CODE> just
|
|
shows the module dependencies. Use <CODE>help</CODE> to see what other formats
|
|
are available:
|
|
</P>
|
|
<PRE>
|
|
> help pm
|
|
> help -printer
|
|
> help help
|
|
</PRE>
|
|
<P>
|
|
Another form of system commands are those usable in GF pipes. The escape symbol
|
|
is then <CODE>?</CODE>.
|
|
</P>
|
|
<PRE>
|
|
> generate_trees | ? wc
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc35"></A>
|
|
<H1>Resource modules</H1>
|
|
<A NAME="toc36"></A>
|
|
<H2>The golden rule of functional programming</H2>
|
|
<P>
|
|
In comparison to the <CODE>.cf</CODE> format, the <CODE>.gf</CODE> format looks rather
|
|
verbose, and demands lots more characters to be written. You have probably
|
|
done this by the copy-paste-modify method, which is a common way to
|
|
avoid repeating work.
|
|
</P>
|
|
<P>
|
|
However, there is a more elegant way to avoid repeating work than the copy-and-paste
|
|
method. The <B>golden rule of functional programming</B> says that
|
|
</P>
|
|
<UL>
|
|
<LI>whenever you find yourself programming by copy-and-paste, write a function instead.
|
|
</UL>
|
|
|
|
<P>
|
|
A function separates the shared parts of different computations from the
|
|
changing parts, its <B>arguments</B>, or <B>parameters</B>.
|
|
In functional programming languages, such as
|
|
<A HREF="http://www.haskell.org">Haskell</A>, it is possible to share much more
|
|
code with functions than in imperative languages such as C and Java.
|
|
</P>
|
|
<A NAME="toc37"></A>
|
|
<H2>Operation definitions</H2>
|
|
<P>
|
|
GF is a functional programming language, not only in the sense that
|
|
the abstract syntax is a system of functions (<CODE>fun</CODE>), but also because
|
|
functional programming can be used to define concrete syntax. This is
|
|
done by using a new form of judgement, with the keyword <CODE>oper</CODE> (for
|
|
<B>operation</B>), distinct from <CODE>fun</CODE> for the sake of clarity.
|
|
Here is a simple example of an operation:
|
|
</P>
|
|
<PRE>
|
|
oper ss : Str -> {s : Str} = \x -> {s = x} ;
|
|
</PRE>
|
|
<P>
|
|
The operation can be <B>applied</B> to an argument, and GF will
|
|
<B>compute</B> the application into a value. For instance,
|
|
</P>
|
|
<PRE>
|
|
ss "boy" ===> {s = "boy"}
|
|
</PRE>
|
|
<P>
|
|
(We use the symbol <CODE>===></CODE> to indicate how an expression is
|
|
computed into a value; this symbol is not a part of GF)
|
|
</P>
|
|
<P>
|
|
Thus an <CODE>oper</CODE> judgement includes the name of the defined operation,
|
|
its type, and an expression defining it. As for the syntax of the defining
|
|
expression, notice the <B>lambda abstraction</B> form <CODE>\x -> t</CODE> of
|
|
the function.
|
|
</P>
|
|
<A NAME="toc38"></A>
|
|
<H2>The ``resource`` module type</H2>
|
|
<P>
|
|
Operator definitions can be included in a concrete syntax.
|
|
But they are not really tied to a particular set of linearization rules.
|
|
They should rather be seen as <B>resources</B>
|
|
usable in many concrete syntaxes.
|
|
</P>
|
|
<P>
|
|
The <CODE>resource</CODE> module type can be used to package
|
|
<CODE>oper</CODE> definitions into reusable resources. Here is
|
|
an example, with a handful of operations to manipulate
|
|
strings and records.
|
|
</P>
|
|
<PRE>
|
|
resource StringOper = {
|
|
oper
|
|
SS : Type = {s : Str} ;
|
|
ss : Str -> SS = \x -> {s = x} ;
|
|
cc : SS -> SS -> SS = \x,y -> ss (x.s ++ y.s) ;
|
|
prefix : Str -> SS -> SS = \p,x -> ss (p ++ x.s) ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
Resource modules can extend other resource modules, in the
|
|
same way as modules of other types can extend modules of the
|
|
same type. Thus it is possible to build resource hierarchies.
|
|
</P>
|
|
<A NAME="toc39"></A>
|
|
<H2>Opening a resource</H2>
|
|
<P>
|
|
Any number of <CODE>resource</CODE> modules can be
|
|
<B>opened</B> in a <CODE>concrete</CODE> syntax, which
|
|
makes definitions contained
|
|
in the resource usable in the concrete syntax. Here is
|
|
an example, where the resource <CODE>StringOper</CODE> is
|
|
opened in a new version of <CODE>FoodEng</CODE>.
|
|
</P>
|
|
<PRE>
|
|
concrete Food2Eng of Food = open StringOper in {
|
|
|
|
lincat
|
|
S, Item, Kind, Quality = SS ;
|
|
|
|
lin
|
|
Is item quality = cc item (prefix "is" quality) ;
|
|
This k = prefix "this" k ;
|
|
That k = prefix "that" k ;
|
|
QKind k q = cc k q ;
|
|
Wine = ss "wine" ;
|
|
Cheese = ss "cheese" ;
|
|
Fish = ss "fish" ;
|
|
Very = prefix "very" ;
|
|
Fresh = ss "fresh" ;
|
|
Warm = ss "warm" ;
|
|
Italian = ss "Italian" ;
|
|
Expensive = ss "expensive" ;
|
|
Delicious = ss "delicious" ;
|
|
Boring = ss "boring" ;
|
|
|
|
}
|
|
</PRE>
|
|
<P>
|
|
<B>Exercise</B>. Use the same string operations to write <CODE>FoodIta</CODE>
|
|
more concisely.
|
|
</P>
|
|
<A NAME="toc40"></A>
|
|
<H2>Partial application</H2>
|
|
<P>
|
|
GF, like Haskell, permits <B>partial application</B> of
|
|
functions. An example of this is the rule
|
|
</P>
|
|
<PRE>
|
|
lin This k = prefix "this" k ;
|
|
</PRE>
|
|
<P>
|
|
which can be written more concisely
|
|
</P>
|
|
<PRE>
|
|
lin This = prefix "this" ;
|
|
</PRE>
|
|
<P>
|
|
The first form is perhaps more intuitive to write
|
|
but, once you get used to partial application, you will appreciate its
|
|
conciseness and elegance. The logic of partial application
|
|
is known as <B>currying</B>, with a reference to Haskell B. Curry.
|
|
The idea is that any <I>n</I>-place function can be defined as a 1-place
|
|
function whose value is an <I>n-</I>1 -place function. Thus
|
|
</P>
|
|
<PRE>
|
|
oper prefix : Str -> SS -> SS ;
|
|
</PRE>
|
|
<P>
|
|
can be used as a 1-place function that takes a <CODE>Str</CODE> into a
|
|
function <CODE>SS -> SS</CODE>. The expected linearization of <CODE>This</CODE> is exactly
|
|
a function of such a type, operating on an argument of type <CODE>Kind</CODE>
|
|
whose linearization is of type <CODE>SS</CODE>. Thus we can define the
|
|
linearization directly as <CODE>prefix "this"</CODE>.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Define an operation <CODE>infix</CODE> analogous to <CODE>prefix</CODE>,
|
|
such that it allows you to write
|
|
</P>
|
|
<PRE>
|
|
lin Is = infix "is" ;
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc41"></A>
|
|
<H2>Testing resource modules</H2>
|
|
<P>
|
|
To test a <CODE>resource</CODE> module independently, you must import it
|
|
with the flag <CODE>-retain</CODE>, which tells GF to retain <CODE>oper</CODE> definitions
|
|
in the memory; the usual behaviour is that <CODE>oper</CODE> definitions
|
|
are just applied to compile linearization rules
|
|
(this is called <B>inlining</B>) and then thrown away.
|
|
</P>
|
|
<PRE>
|
|
> i -retain StringOper.gf
|
|
</PRE>
|
|
<P>
|
|
The command <CODE>compute_concrete = cc</CODE> computes any expression
|
|
formed by operations and other GF constructs. For example,
|
|
</P>
|
|
<PRE>
|
|
> compute_concrete prefix "in" (ss "addition")
|
|
{
|
|
s : Str = "in" ++ "addition"
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc42"></A>
|
|
<H2>Division of labour</H2>
|
|
<P>
|
|
Using operations defined in resource modules is a
|
|
way to avoid repetitive code.
|
|
In addition, it enables a new kind of modularity
|
|
and division of labour in grammar writing: grammarians familiar with
|
|
the linguistic details of a language can make their knowledge
|
|
available through resource grammar modules, whose users only need
|
|
to pick the right operations and not to know their implementation
|
|
details.
|
|
</P>
|
|
<P>
|
|
In the following sections, we will go through some
|
|
such linguistic details. The programming constructs needed when
|
|
doing this are useful for all GF programmers, even if they don't
|
|
hand-code the linguistics of their applications but get them
|
|
from libraries. It is also useful to know something about the
|
|
linguistic concepts of inflection, agreement, and parts of speech.
|
|
</P>
|
|
<A NAME="toc43"></A>
|
|
<H1>Morphology</H1>
|
|
<P>
|
|
Suppose we want to say, with the vocabulary included in
|
|
<CODE>Food.gf</CODE>, things like
|
|
</P>
|
|
<PRE>
|
|
all Italian wines are delicious
|
|
</PRE>
|
|
<P>
|
|
The new grammatical facility we need are the plural forms
|
|
of nouns and verbs (<I>wines, are</I>), as opposed to their
|
|
singular forms.
|
|
</P>
|
|
<P>
|
|
The introduction of plural forms requires two things:
|
|
</P>
|
|
<UL>
|
|
<LI>the <B>inflection</B> of nouns and verbs in singular and plural
|
|
<LI>the <B>agreement</B> of the verb to subject:
|
|
the verb must have the same number as the subject
|
|
</UL>
|
|
|
|
<P>
|
|
Different languages have different rules of inflection and agreement.
|
|
For instance, Italian has also agreement in gender (masculine vs. feminine).
|
|
We want to express such special features of languages in the
|
|
concrete syntax while ignoring them in the abstract syntax.
|
|
</P>
|
|
<P>
|
|
To be able to do all this, we need one new judgement form
|
|
and many new expression forms.
|
|
We also need to generalize linearization types
|
|
from strings to more complex types.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Make a list of the possible forms that nouns,
|
|
adjectives, and verbs can have in some languages that you know.
|
|
</P>
|
|
<A NAME="toc44"></A>
|
|
<H2>Parameters and tables</H2>
|
|
<P>
|
|
We define the <B>parameter type</B> of number in Englisn by
|
|
using a new form of judgement:
|
|
</P>
|
|
<PRE>
|
|
param Number = Sg | Pl ;
|
|
</PRE>
|
|
<P>
|
|
To express that <CODE>Kind</CODE> expressions in English have a linearization
|
|
depending on number, we replace the linearization type <CODE>{s : Str}</CODE>
|
|
with a type where the <CODE>s</CODE> field is a <B>table</B> depending on number:
|
|
</P>
|
|
<PRE>
|
|
lincat Kind = {s : Number => Str} ;
|
|
</PRE>
|
|
<P>
|
|
The <B>table type</B> <CODE>Number => Str</CODE> is in many respects similar to
|
|
a function type (<CODE>Number -> Str</CODE>). The main difference is that the
|
|
argument type of a table type must always be a parameter type. This means
|
|
that the argument-value pairs can be listed in a finite table. The following
|
|
example shows such a table:
|
|
</P>
|
|
<PRE>
|
|
lin Cheese = {s = table {
|
|
Sg => "cheese" ;
|
|
Pl => "cheeses"
|
|
}
|
|
} ;
|
|
</PRE>
|
|
<P>
|
|
The table consists of <B>branches</B>, where a <B>pattern</B> on the
|
|
left of the arrow <CODE>=></CODE> is assigned a <B>value</B> on the right.
|
|
</P>
|
|
<P>
|
|
The application of a table to a parameter is done by the <B>selection</B>
|
|
operator <CODE>!</CODE>. For instance,
|
|
</P>
|
|
<PRE>
|
|
table {Sg => "cheese" ; Pl => "cheeses"} ! Pl
|
|
</PRE>
|
|
<P>
|
|
is a selection that computes into the value <CODE>"cheeses"</CODE>.
|
|
This computation is performed by <B>pattern matching</B>: return
|
|
the value from the first branch whose pattern matches the
|
|
selection argument. Thus
|
|
</P>
|
|
<PRE>
|
|
table {Sg => "cheese" ; Pl => "cheeses"} ! Pl
|
|
===> "cheeses"
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<B>Exercise</B>. In a previous exercise, we make a list of the possible
|
|
forms that nouns, adjectives, and verbs can have in some languages that
|
|
you know. Now take some of the results and implement them by
|
|
using parameter type definitions and tables. Write them into a <CODE>resource</CODE>
|
|
module, which you can test by using the command <CODE>compute_concrete</CODE>.
|
|
</P>
|
|
<A NAME="toc45"></A>
|
|
<H2>Inflection tables and paradigms</H2>
|
|
<P>
|
|
All English common nouns are inflected in number, most of them in the
|
|
same way: the plural form is obtained from the singular by adding the
|
|
ending <I>s</I>. This rule is an example of
|
|
a <B>paradigm</B> - a formula telling how the inflection
|
|
forms of a word are formed.
|
|
</P>
|
|
<P>
|
|
From the GF point of view, a paradigm is a function that takes a <B>lemma</B> -
|
|
also known as a <B>dictionary form</B> - and returns an inflection
|
|
table of desired type. Paradigms are not functions in the sense of the
|
|
<CODE>fun</CODE> judgements of abstract syntax (which operate on trees and not
|
|
on strings), but operations defined in <CODE>oper</CODE> judgements.
|
|
The following operation defines the regular noun paradigm of English:
|
|
</P>
|
|
<PRE>
|
|
oper regNoun : Str -> {s : Number => Str} = \x -> {
|
|
s = table {
|
|
Sg => x ;
|
|
Pl => x + "s"
|
|
}
|
|
} ;
|
|
</PRE>
|
|
<P>
|
|
The <B>gluing</B> operator <CODE>+</CODE> tells that
|
|
the string held in the variable <CODE>x</CODE> and the ending <CODE>"s"</CODE>
|
|
are written together to form one <B>token</B>. Thus, for instance,
|
|
</P>
|
|
<PRE>
|
|
(regNoun "cheese").s ! Pl ---> "cheese" + "s" ---> "cheeses"
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<B>Exercise</B>. Identify cases in which the <CODE>regNoun</CODE> paradigm does not
|
|
apply in English, and implement some alternative paradigms.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Implement a paradigm for regular verbs in English.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Implement some regular paradigms for other languages you have
|
|
considered in earlier exercises.
|
|
</P>
|
|
<A NAME="toc46"></A>
|
|
<H2>Worst-case functions and data abstraction</H2>
|
|
<P>
|
|
Some English nouns, such as <CODE>mouse</CODE>, are so irregular that
|
|
it makes no sense to see them as instances of a paradigm. Even
|
|
then, it is useful to perform <B>data abstraction</B> from the
|
|
definition of the type <CODE>Noun</CODE>, and introduce a constructor
|
|
operation, a <B>worst-case function</B> for nouns:
|
|
</P>
|
|
<PRE>
|
|
oper mkNoun : Str -> Str -> Noun = \x,y -> {
|
|
s = table {
|
|
Sg => x ;
|
|
Pl => y
|
|
}
|
|
} ;
|
|
</PRE>
|
|
<P>
|
|
Thus we can define
|
|
</P>
|
|
<PRE>
|
|
lin Mouse = mkNoun "mouse" "mice" ;
|
|
</PRE>
|
|
<P>
|
|
and
|
|
</P>
|
|
<PRE>
|
|
oper regNoun : Str -> Noun = \x ->
|
|
mkNoun x (x + "s") ;
|
|
</PRE>
|
|
<P>
|
|
instead of writing the inflection tables explicitly.
|
|
</P>
|
|
<P>
|
|
The grammar engineering advantage of worst-case functions is that
|
|
the author of the resource module may change the definitions of
|
|
<CODE>Noun</CODE> and <CODE>mkNoun</CODE>, and still retain the
|
|
interface (i.e. the system of type signatures) that makes it
|
|
correct to use these functions in concrete modules. In programming
|
|
terms, <CODE>Noun</CODE> is then treated as an <B>abstract datatype</B>.
|
|
</P>
|
|
<A NAME="toc47"></A>
|
|
<H2>A system of paradigms using Prelude operations</H2>
|
|
<P>
|
|
In addition to the completely regular noun paradigm <CODE>regNoun</CODE>,
|
|
some other frequent noun paradigms deserve to be
|
|
defined, for instance,
|
|
</P>
|
|
<PRE>
|
|
sNoun : Str -> Noun = \kiss -> mkNoun kiss (kiss + "es") ;
|
|
</PRE>
|
|
<P>
|
|
What about nouns like <I>fly</I>, with the plural <I>flies</I>? The already
|
|
available solution is to use the longest common prefix
|
|
<I>fl</I> (also known as the <B>technical stem</B>) as argument, and define
|
|
</P>
|
|
<PRE>
|
|
yNoun : Str -> Noun = \fl -> mkNoun (fl + "y") (fl + "ies") ;
|
|
</PRE>
|
|
<P>
|
|
But this paradigm would be very unintuitive to use, because the technical stem
|
|
is not an existing form of the word. A better solution is to use
|
|
the lemma and a string operator <CODE>init</CODE>, which returns the initial segment (i.e.
|
|
all characters but the last) of a string:
|
|
</P>
|
|
<PRE>
|
|
yNoun : Str -> Noun = \fly -> mkNoun fly (init fly + "ies") ;
|
|
</PRE>
|
|
<P>
|
|
The operation <CODE>init</CODE> belongs to a set of operations in the
|
|
resource module <CODE>Prelude</CODE>, which therefore has to be
|
|
<CODE>open</CODE>ed so that <CODE>init</CODE> can be used. Its dual is <CODE>last</CODE>:
|
|
</P>
|
|
<PRE>
|
|
> cc init "curry"
|
|
"curr"
|
|
|
|
> cc last "curry"
|
|
"y"
|
|
</PRE>
|
|
<P>
|
|
As generalizations of the library functions <CODE>init</CODE> and <CODE>last</CODE>, GF has
|
|
two predefined funtions:
|
|
<CODE>Predef.dp</CODE>, which "drops" suffixes of any length,
|
|
and <CODE>Predef.tk</CODE>, which "takes" a prefix
|
|
just omitting a number of characters from the end. For instance,
|
|
</P>
|
|
<PRE>
|
|
> cc Predef.tk 3 "worried"
|
|
"worr"
|
|
> cc Predef.dp 3 "worried"
|
|
"ied"
|
|
</PRE>
|
|
<P>
|
|
The prefix <CODE>Predef</CODE> is given to a handful of functions that could
|
|
not be defined internally in GF. They are available in all modules
|
|
without explicit <CODE>open</CODE> of the module <CODE>Predef</CODE>.
|
|
</P>
|
|
<A NAME="toc48"></A>
|
|
<H2>Pattern matching</H2>
|
|
<P>
|
|
We have so far built all expressions of the <CODE>table</CODE> form
|
|
from branches whose patterns are constants introduced in
|
|
<CODE>param</CODE> definitions, as well as constant strings.
|
|
But there are more expressive patterns. Here is a summary of the possible forms:
|
|
</P>
|
|
<UL>
|
|
<LI>a variable pattern (identifier other than constant parameter) matches anything
|
|
<LI>the wild card <CODE>_</CODE> matches anything
|
|
<LI>a string literal pattern, e.g. <CODE>"s"</CODE>, matches the same string
|
|
<LI>a disjunctive pattern <CODE>P | ... | Q</CODE> matches anything that
|
|
one of the disjuncts matches
|
|
</UL>
|
|
|
|
<P>
|
|
Pattern matching is performed in the order in which the branches
|
|
appear in the table: the branch of the first matching pattern is followed.
|
|
</P>
|
|
<P>
|
|
As syntactic sugar, one-branch tables can be written concisely,
|
|
</P>
|
|
<PRE>
|
|
\\P,...,Q => t === table {P => ... table {Q => t} ...}
|
|
</PRE>
|
|
<P>
|
|
Finally, the <CODE>case</CODE> expressions common in functional
|
|
programming languages are syntactic sugar for table selections:
|
|
</P>
|
|
<PRE>
|
|
case e of {...} === table {...} ! e
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc49"></A>
|
|
<H2>An intelligent noun paradigm using pattern matching</H2>
|
|
<P>
|
|
It may be hard for the user of a resource morphology to pick the right
|
|
inflection paradigm. A way to help this is to define a more intelligent
|
|
paradigm, which chooses the ending by first analysing the lemma.
|
|
The following variant for English regular nouns puts together all the
|
|
previously shown paradigms, and chooses one of them on the basis of
|
|
the final letter of the lemma (found by the prelude operator <CODE>last</CODE>).
|
|
</P>
|
|
<PRE>
|
|
regNoun : Str -> Noun = \s -> case last s of {
|
|
"s" | "z" => mkNoun s (s + "es") ;
|
|
"y" => mkNoun s (init s + "ies") ;
|
|
_ => mkNoun s (s + "s")
|
|
} ;
|
|
</PRE>
|
|
<P>
|
|
This definition displays many GF expression forms not shown befores;
|
|
these forms are explained in the next section.
|
|
</P>
|
|
<P>
|
|
The paradigms <CODE>regNoun</CODE> does not give the correct forms for
|
|
all nouns. For instance, <I>mouse - mice</I> and
|
|
<I>fish - fish</I> must be given by using <CODE>mkNoun</CODE>.
|
|
Also the word <I>boy</I> would be inflected incorrectly; to prevent
|
|
this, either use <CODE>mkNoun</CODE> or modify
|
|
<CODE>regNoun</CODE> so that the <CODE>"y"</CODE> case does not
|
|
apply if the second-last character is a vowel.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Extend the <CODE>regNoun</CODE> paradigm so that it takes care
|
|
of all variations there are in English. Test it with the nouns
|
|
<I>ax</I>, <I>bamboo</I>, <I>boy</I>, <I>bush</I>, <I>hero</I>, <I>match</I>.
|
|
<B>Hint</B>. The library functions <CODE>Predef.dp</CODE> and <CODE>Predef.tk</CODE>
|
|
are useful in this task.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. The same rules that form plural nouns in English also
|
|
apply in the formation of third-person singular verbs.
|
|
Write a regular verb paradigm that uses this idea, but first
|
|
rewrite <CODE>regNoun</CODE> so that the analysis needed to build <I>s</I>-forms
|
|
is factored out as a separate <CODE>oper</CODE>, which is shared with
|
|
<CODE>regVerb</CODE>.
|
|
</P>
|
|
<A NAME="toc50"></A>
|
|
<H2>Morphological resource modules</H2>
|
|
<P>
|
|
A common idiom is to
|
|
gather the <CODE>oper</CODE> and <CODE>param</CODE> definitions
|
|
needed for inflecting words in
|
|
a language into a morphology module. Here is a simple
|
|
example, <A HREF="resource/MorphoEng.gf"><CODE>MorphoEng</CODE></A>.
|
|
</P>
|
|
<PRE>
|
|
--# -path=.:prelude
|
|
|
|
resource MorphoEng = open Prelude in {
|
|
|
|
param
|
|
Number = Sg | Pl ;
|
|
|
|
oper
|
|
Noun, Verb : Type = {s : Number => Str} ;
|
|
|
|
mkNoun : Str -> Str -> Noun = \x,y -> {
|
|
s = table {
|
|
Sg => x ;
|
|
Pl => y
|
|
}
|
|
} ;
|
|
|
|
regNoun : Str -> Noun = \s -> case last s of {
|
|
"s" | "z" => mkNoun s (s + "es") ;
|
|
"y" => mkNoun s (init s + "ies") ;
|
|
_ => mkNoun s (s + "s")
|
|
} ;
|
|
|
|
mkVerb : Str -> Str -> Verb = \x,y -> mkNoun y x ;
|
|
|
|
regVerb : Str -> Verb = \s -> case last s of {
|
|
"s" | "z" => mkVerb s (s + "es") ;
|
|
"y" => mkVerb s (init s + "ies") ;
|
|
"o" => mkVerb s (s + "es") ;
|
|
_ => mkVerb s (s + "s")
|
|
} ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
The first line gives as a hint to the compiler the
|
|
<B>search path</B> needed to find all the other modules that the
|
|
module depends on. The directory <CODE>prelude</CODE> is a subdirectory of
|
|
<CODE>GF/lib</CODE>; to be able to refer to it in this simple way, you can
|
|
set the environment variable <CODE>GF_LIB_PATH</CODE> to point to this
|
|
directory.
|
|
</P>
|
|
<A NAME="toc51"></A>
|
|
<H1>Using parameters in concrete syntax</H1>
|
|
<P>
|
|
We can now enrich the concrete syntax definitions to
|
|
comprise morphology. This will involve a more radical
|
|
variation between languages (e.g. English and Italian)
|
|
then just the use of different words. In general,
|
|
parameters and linearization types are different in
|
|
different languages - but this does not prevent the
|
|
use of a common abstract syntax.
|
|
</P>
|
|
<A NAME="toc52"></A>
|
|
<H2>Parametric vs. inherent features, agreement</H2>
|
|
<P>
|
|
The rule of subject-verb agreement in English says that the verb
|
|
phrase must be inflected in the number of the subject. This
|
|
means that a noun phrase (functioning as a subject), inherently
|
|
<I>has</I> a number, which it passes to the verb. The verb does not
|
|
<I>have</I> a number, but must be able to <I>receive</I> whatever number the
|
|
subject has. This distinction is nicely represented by the
|
|
different linearization types of <B>noun phrases</B> and <B>verb phrases</B>:
|
|
</P>
|
|
<PRE>
|
|
lincat NP = {s : Str ; n : Number} ;
|
|
lincat VP = {s : Number => Str} ;
|
|
</PRE>
|
|
<P>
|
|
We say that the number of <CODE>NP</CODE> is an <B>inherent feature</B>,
|
|
whereas the number of <CODE>NP</CODE> is a <B>variable feature</B> (or a
|
|
<B>parametric feature</B>).
|
|
</P>
|
|
<P>
|
|
The agreement rule itself is expressed in the linearization rule of
|
|
the predication function:
|
|
</P>
|
|
<PRE>
|
|
lin PredVP np vp = {s = np.s ++ vp.s ! np.n} ;
|
|
</PRE>
|
|
<P>
|
|
The following section will present
|
|
<CODE>FoodsEng</CODE>, assuming the abstract syntax <CODE>Foods</CODE>
|
|
that is similar to <CODE>Food</CODE> but also has the
|
|
plural determiners <CODE>These</CODE> and <CODE>Those</CODE>.
|
|
The reader is invited to inspect the way in which agreement works in
|
|
the formation of sentences.
|
|
</P>
|
|
<A NAME="toc53"></A>
|
|
<H2>English concrete syntax with parameters</H2>
|
|
<P>
|
|
The grammar uses both
|
|
<A HREF="../../lib/prelude/Prelude.gf"><CODE>Prelude</CODE></A> and
|
|
<A HREF="resource/MorphoEng"><CODE>MorphoEng</CODE></A>.
|
|
We will later see how to make the grammar even
|
|
more high-level by using a resource grammar library
|
|
and parametrized modules.
|
|
</P>
|
|
<PRE>
|
|
--# -path=.:resource:prelude
|
|
|
|
concrete FoodsEng of Foods = open Prelude, MorphoEng in {
|
|
|
|
lincat
|
|
S, Quality = SS ;
|
|
Kind = {s : Number => Str} ;
|
|
Item = {s : Str ; n : Number} ;
|
|
|
|
lin
|
|
Is item quality = ss (item.s ++ (mkVerb "are" "is").s ! item.n ++ quality.s) ;
|
|
This = det Sg "this" ;
|
|
That = det Sg "that" ;
|
|
These = det Pl "these" ;
|
|
Those = det Pl "those" ;
|
|
QKind quality kind = {s = \\n => quality.s ++ kind.s ! n} ;
|
|
Wine = regNoun "wine" ;
|
|
Cheese = regNoun "cheese" ;
|
|
Fish = mkNoun "fish" "fish" ;
|
|
Very = prefixSS "very" ;
|
|
Fresh = ss "fresh" ;
|
|
Warm = ss "warm" ;
|
|
Italian = ss "Italian" ;
|
|
Expensive = ss "expensive" ;
|
|
Delicious = ss "delicious" ;
|
|
Boring = ss "boring" ;
|
|
|
|
oper
|
|
det : Number -> Str -> Noun -> {s : Str ; n : Number} = \n,d,cn -> {
|
|
s = d ++ cn.s ! n ;
|
|
n = n
|
|
} ;
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc54"></A>
|
|
<H2>Hierarchic parameter types</H2>
|
|
<P>
|
|
The reader familiar with a functional programming language such as
|
|
<A HREF="http://www.haskell.org">Haskell</A> must have noticed the similarity
|
|
between parameter types in GF and <B>algebraic datatypes</B> (<CODE>data</CODE> definitions
|
|
in Haskell). The GF parameter types are actually a special case of algebraic
|
|
datatypes: the main restriction is that in GF, these types must be finite.
|
|
(It is this restriction that makes it possible to invert linearization rules into
|
|
parsing methods.)
|
|
</P>
|
|
<P>
|
|
However, finite is not the same thing as enumerated. Even in GF, parameter
|
|
constructors can take arguments, provided these arguments are from other
|
|
parameter types - only recursion is forbidden. Such parameter types impose a
|
|
hierarchic order among parameters. They are often needed to define
|
|
the linguistically most accurate parameter systems.
|
|
</P>
|
|
<P>
|
|
To give an example, Swedish adjectives
|
|
are inflected in number (singular or plural) and
|
|
gender (uter or neuter). These parameters would suggest 2*2=4 different
|
|
forms. However, the gender distinction is done only in the singular. Therefore,
|
|
it would be inaccurate to define adjective paradigms using the type
|
|
<CODE>Gender => Number => Str</CODE>. The following hierarchic definition
|
|
yields an accurate system of three adjectival forms.
|
|
</P>
|
|
<PRE>
|
|
param AdjForm = ASg Gender | APl ;
|
|
param Gender = Utr | Neutr ;
|
|
</PRE>
|
|
<P>
|
|
Here is an example of pattern matching, the paradigm of regular adjectives.
|
|
</P>
|
|
<PRE>
|
|
oper regAdj : Str -> AdjForm => Str = \fin -> table {
|
|
ASg Utr => fin ;
|
|
ASg Neutr => fin + "t" ;
|
|
APl => fin + "a" ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
A constructor can be used as a pattern that has patterns as arguments. For instance,
|
|
the adjectival paradigm in which the two singular forms are the same,
|
|
can be defined
|
|
</P>
|
|
<PRE>
|
|
oper plattAdj : Str -> AdjForm => Str = \platt -> table {
|
|
ASg _ => platt ;
|
|
APl => platt + "a" ;
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc55"></A>
|
|
<H2>Morphological analysis and morphology quiz</H2>
|
|
<P>
|
|
Even though morphology is in GF
|
|
mostly used as an auxiliary for syntax, it
|
|
can also be useful on its own right. The command <CODE>morpho_analyse = ma</CODE>
|
|
can be used to read a text and return for each word the analyses that
|
|
it has in the current concrete syntax.
|
|
</P>
|
|
<PRE>
|
|
> rf bible.txt | morpho_analyse
|
|
</PRE>
|
|
<P>
|
|
In the same way as translation exercises, morphological exercises can
|
|
be generated, by the command <CODE>morpho_quiz = mq</CODE>. Usually,
|
|
the category is set to be something else than <CODE>S</CODE>. For instance,
|
|
</P>
|
|
<PRE>
|
|
> cd GF/lib/resource-1.0/
|
|
> i french/IrregFre.gf
|
|
> morpho_quiz -cat=V
|
|
|
|
Welcome to GF Morphology Quiz.
|
|
...
|
|
|
|
réapparaître : VFin VCondit Pl P2
|
|
réapparaitriez
|
|
> No, not réapparaitriez, but
|
|
réapparaîtriez
|
|
Score 0/1
|
|
</PRE>
|
|
<P>
|
|
Finally, a list of morphological exercises can be generated
|
|
off-line and saved in a
|
|
file for later use, by the command <CODE>morpho_list = ml</CODE>
|
|
</P>
|
|
<PRE>
|
|
> morpho_list -number=25 -cat=V | wf exx.txt
|
|
</PRE>
|
|
<P>
|
|
The <CODE>number</CODE> flag gives the number of exercises generated.
|
|
</P>
|
|
<A NAME="toc56"></A>
|
|
<H2>Discontinuous constituents</H2>
|
|
<P>
|
|
A linearization type may contain more strings than one.
|
|
An example of where this is useful are English particle
|
|
verbs, such as <I>switch off</I>. The linearization of
|
|
a sentence may place the object between the verb and the particle:
|
|
<I>he switched it off</I>.
|
|
</P>
|
|
<P>
|
|
The following judgement defines transitive verbs as
|
|
<B>discontinuous constituents</B>, i.e. as having a linearization
|
|
type with two strings and not just one.
|
|
</P>
|
|
<PRE>
|
|
lincat TV = {s : Number => Str ; part : Str} ;
|
|
</PRE>
|
|
<P>
|
|
This linearization rule
|
|
shows how the constituents are separated by the object in complementization.
|
|
</P>
|
|
<PRE>
|
|
lin PredTV tv obj = {s = \\n => tv.s ! n ++ obj.s ++ tv.part} ;
|
|
</PRE>
|
|
<P>
|
|
There is no restriction in the number of discontinuous constituents
|
|
(or other fields) a <CODE>lincat</CODE> may contain. The only condition is that
|
|
the fields must be of finite types, i.e. built from records, tables,
|
|
parameters, and <CODE>Str</CODE>, and not functions.
|
|
</P>
|
|
<P>
|
|
A mathematical result
|
|
about parsing in GF says that the worst-case complexity of parsing
|
|
increases with the number of discontinuous constituents. This is
|
|
potentially a reason to avoid discontinuous constituents.
|
|
Moreover, the parsing and linearization commands only give accurate
|
|
results for categories whose linearization type has a unique <CODE>Str</CODE>
|
|
valued field labelled <CODE>s</CODE>. Therefore, discontinuous constituents
|
|
are not a good idea in top-level categories accessed by the users
|
|
of a grammar application.
|
|
</P>
|
|
<A NAME="toc57"></A>
|
|
<H2>Free variation</H2>
|
|
<P>
|
|
Sometimes there are many alternative ways to define a concrete syntax.
|
|
For instance, the verb negation in English can be expressed both by
|
|
<I>does not</I> and <I>doesn't</I>. In linguistic terms, these expressions
|
|
are in <B>free variation</B>. The <CODE>variants</CODE> construct of GF can
|
|
be used to give a list of strings in free variation. For example,
|
|
</P>
|
|
<PRE>
|
|
NegVerb verb = {s = variants {["does not"] ; "doesn't} ++ verb.s ! Pl} ;
|
|
</PRE>
|
|
<P>
|
|
An empty variant list
|
|
</P>
|
|
<PRE>
|
|
variants {}
|
|
</PRE>
|
|
<P>
|
|
can be used e.g. if a word lacks a certain form.
|
|
</P>
|
|
<P>
|
|
In general, <CODE>variants</CODE> should be used cautiously. It is not
|
|
recommended for modules aimed to be libraries, because the
|
|
user of the library has no way to choose among the variants.
|
|
</P>
|
|
<A NAME="toc58"></A>
|
|
<H2>Overloading of operations</H2>
|
|
<P>
|
|
Large libraries, such as the GF Resource Grammar Library, may define
|
|
hundreds of names, which can be unpractical
|
|
for both the library writer and the user. The writer has to invent longer
|
|
and longer names which are not always intuitive,
|
|
and the user has to learn or at least be able to find all these names.
|
|
A solution to this problem, adopted by languages such as C++, is <B>overloading</B>:
|
|
the same name can be used for several functions. When such a name is used, the
|
|
compiler performs <B>overload resolution</B> to find out which of the possible functions
|
|
is meant. The resolution is based on the types of the functions: all functions that
|
|
have the same name must have different types.
|
|
</P>
|
|
<P>
|
|
In C++, functions with the same name can be scattered everywhere in the program.
|
|
In GF, they must be grouped together in <CODE>overload</CODE> groups. Here is an example
|
|
of an overload group, defining four ways to define nouns in Italian:
|
|
</P>
|
|
<PRE>
|
|
oper mkN = overload {
|
|
mkN : Str -> N = -- regular nouns
|
|
mkN : Str -> Gender -> N = -- regular nouns with unexpected gender
|
|
mkN : Str -> Str -> N = -- irregular nouns
|
|
mkN : Str -> Str -> Gender -> N = -- irregular nouns with unexpected gender
|
|
}
|
|
</PRE>
|
|
<P>
|
|
All of the following uses of <CODE>mkN</CODE> are easy to resolve:
|
|
</P>
|
|
<PRE>
|
|
lin Pizza = mkN "pizza" ; -- Str -> N
|
|
lin Hand = mkN "mano" Fem ; -- Str -> Gender -> N
|
|
lin Man = mkN "uomo" "uomini" ; -- Str -> Str -> N
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc59"></A>
|
|
<H1>More constructs for concrete syntax</H1>
|
|
<P>
|
|
In this chapter, we go through constructs that are not necessary in simple grammars
|
|
or when the concrete syntax relies on libraries. But they are useful when
|
|
writing advanced concrete syntax implementations, such as resource grammar libraries.
|
|
This chapter can safely be skipped if the reader prefers to continue to the
|
|
chapter on using libraries.
|
|
</P>
|
|
<A NAME="toc60"></A>
|
|
<H2>Local definitions</H2>
|
|
<P>
|
|
Local definitions ("<CODE>let</CODE> expressions") are used in functional
|
|
programming for two reasons: to structure the code into smaller
|
|
expressions, and to avoid repeated computation of one and
|
|
the same expression. Here is an example, from
|
|
<A HREF="resource/MorphoIta.gf"><CODE>MorphoIta</CODE></A>:
|
|
</P>
|
|
<PRE>
|
|
oper regNoun : Str -> Noun = \vino ->
|
|
let
|
|
vin = init vino ;
|
|
o = last vino
|
|
in
|
|
case o of {
|
|
"a" => mkNoun Fem vino (vin + "e") ;
|
|
"o" | "e" => mkNoun Masc vino (vin + "i") ;
|
|
_ => mkNoun Masc vino vino
|
|
} ;
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc61"></A>
|
|
<H2>Record extension and subtyping</H2>
|
|
<P>
|
|
Record types and records can be <B>extended</B> with new fields. For instance,
|
|
in German it is natural to see transitive verbs as verbs with a case.
|
|
The symbol <CODE>**</CODE> is used for both constructs.
|
|
</P>
|
|
<PRE>
|
|
lincat TV = Verb ** {c : Case} ;
|
|
|
|
lin Follow = regVerb "folgen" ** {c = Dative} ;
|
|
</PRE>
|
|
<P>
|
|
To extend a record type or a record with a field whose label it
|
|
already has is a type error.
|
|
</P>
|
|
<P>
|
|
A record type <I>T</I> is a <B>subtype</B> of another one <I>R</I>, if <I>T</I> has
|
|
all the fields of <I>R</I> and possibly other fields. For instance,
|
|
an extension of a record type is always a subtype of it.
|
|
</P>
|
|
<P>
|
|
If <I>T</I> is a subtype of <I>R</I>, an object of <I>T</I> can be used whenever
|
|
an object of <I>R</I> is required. For instance, a transitive verb can
|
|
be used whenever a verb is required.
|
|
</P>
|
|
<P>
|
|
<B>Contravariance</B> means that a function taking an <I>R</I> as argument
|
|
can also be applied to any object of a subtype <I>T</I>.
|
|
</P>
|
|
<A NAME="toc62"></A>
|
|
<H2>Tuples and product types</H2>
|
|
<P>
|
|
Product types and tuples are syntactic sugar for record types and records:
|
|
</P>
|
|
<PRE>
|
|
T1 * ... * Tn === {p1 : T1 ; ... ; pn : Tn}
|
|
<t1, ..., tn> === {p1 = T1 ; ... ; pn = Tn}
|
|
</PRE>
|
|
<P>
|
|
Thus the labels <CODE>p1, p2,...</CODE> are hard-coded.
|
|
</P>
|
|
<A NAME="toc63"></A>
|
|
<H2>Record and tuple patterns</H2>
|
|
<P>
|
|
Record types of parameter types are also parameter types.
|
|
A typical example is a record of agreement features, e.g. French
|
|
</P>
|
|
<PRE>
|
|
oper Agr : PType = {g : Gender ; n : Number ; p : Person} ;
|
|
</PRE>
|
|
<P>
|
|
Notice the term <CODE>PType</CODE> rather than just <CODE>Type</CODE> referring to
|
|
parameter types. Every <CODE>PType</CODE> is also a <CODE>Type</CODE>, but not vice-versa.
|
|
</P>
|
|
<P>
|
|
Pattern matching is done in the expected way, but it can moreover
|
|
utilize partial records: the branch
|
|
</P>
|
|
<PRE>
|
|
{g = Fem} => t
|
|
</PRE>
|
|
<P>
|
|
in a table of type <CODE>Agr => T</CODE> means the same as
|
|
</P>
|
|
<PRE>
|
|
{g = Fem ; n = _ ; p = _} => t
|
|
</PRE>
|
|
<P>
|
|
Tuple patterns are translated to record patterns in the
|
|
same way as tuples to records; partial patterns make it
|
|
possible to write, slightly surprisingly,
|
|
</P>
|
|
<PRE>
|
|
case <g,n,p> of {
|
|
<Fem> => t
|
|
...
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc64"></A>
|
|
<H2>Regular expression patterns</H2>
|
|
<P>
|
|
To define string operations computed at compile time, such
|
|
as in morphology, it is handy to use regular expression patterns:
|
|
</P>
|
|
<UL>
|
|
<LI><I>p</I> <CODE>+</CODE> <I>q</I> : token consisting of <I>p</I> followed by <I>q</I>
|
|
<LI><I>p</I> <CODE>*</CODE> : token <I>p</I> repeated 0 or more times
|
|
(max the length of the string to be matched)
|
|
<LI><CODE>-</CODE> <I>p</I> : matches anything that <I>p</I> does not match
|
|
<LI><I>x</I> <CODE>@</CODE> <I>p</I> : bind to <I>x</I> what <I>p</I> matches
|
|
<LI><I>p</I> <CODE>|</CODE> <I>q</I> : matches what either <I>p</I> or <I>q</I> matches
|
|
</UL>
|
|
|
|
<P>
|
|
The last three apply to all types of patterns, the first two only to token strings.
|
|
As an example, we give a rule for the formation of English word forms
|
|
ending with an <I>s</I> and used in the formation of both plural nouns and
|
|
third-person present-tense verbs.
|
|
</P>
|
|
<PRE>
|
|
add_s : Str -> Str = \w -> case w of {
|
|
_ + "oo" => w + "s" ; -- bamboo
|
|
_ + ("s" | "z" | "x" | "sh" | "o") => w + "es" ; -- bus, hero
|
|
_ + ("a" | "o" | "u" | "e") + "y" => w + "s" ; -- boy
|
|
x + "y" => x + "ies" ; -- fly
|
|
_ => w + "s" -- car
|
|
} ;
|
|
</PRE>
|
|
<P>
|
|
Here is another example, the plural formation in Swedish 2nd declension.
|
|
The second branch uses a variable binding with <CODE>@</CODE> to cover the cases where an
|
|
unstressed pre-final vowel <I>e</I> disappears in the plural
|
|
(<I>nyckel-nycklar, seger-segrar, bil-bilar</I>):
|
|
</P>
|
|
<PRE>
|
|
plural2 : Str -> Str = \w -> case w of {
|
|
pojk + "e" => pojk + "ar" ;
|
|
nyck + "e" + l@("l" | "r" | "n") => nyck + l + "ar" ;
|
|
bil => bil + "ar"
|
|
} ;
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
Semantics: variables are always bound to the <B>first match</B>, which is the first
|
|
in the sequence of binding lists <CODE>Match p v</CODE> defined as follows. In the definition,
|
|
<CODE>p</CODE> is a pattern and <CODE>v</CODE> is a value. The semantics is given in Haskell notation.
|
|
</P>
|
|
<PRE>
|
|
Match (p1|p2) v = Match p1 ++ U Match p2 v
|
|
Match (p1+p2) s = [Match p1 s1 ++ Match p2 s2 |
|
|
i <- [0..length s], (s1,s2) = splitAt i s]
|
|
Match p* s = [[]] if Match "" s ++ Match p s ++ Match (p+p) s ++... /= []
|
|
Match -p v = [[]] if Match p v = []
|
|
Match c v = [[]] if c == v -- for constant and literal patterns c
|
|
Match x v = [[(x,v)]] -- for variable patterns x
|
|
Match x@p v = [[(x,v)]] + M if M = Match p v /= []
|
|
Match p v = [] otherwise -- failure
|
|
</PRE>
|
|
<P>
|
|
Examples:
|
|
</P>
|
|
<UL>
|
|
<LI><CODE>x + "e" + y</CODE> matches <CODE>"peter"</CODE> with <CODE>x = "p", y = "ter"</CODE>
|
|
<LI><CODE>x + "er"*</CODE> matches <CODE>"burgerer"</CODE> with ``x = "burg"
|
|
</UL>
|
|
|
|
<P>
|
|
<B>Exercise</B>. Implement the German <B>Umlaut</B> operation on word stems.
|
|
The operation changes the vowel of the stressed stem syllable as follows:
|
|
<I>a</I> to <I>ä</I>, <I>au</I> to <I>äu</I>, <I>o</I> to <I>ö</I>, and <I>u</I> to <I>ü</I>. You
|
|
can assume that the operation only takes syllables as arguments. Test the
|
|
operation to see whether it correctly changes <I>Arzt</I> to <I>Ärzt</I>,
|
|
<I>Baum</I> to <I>Bäum</I>, <I>Topf</I> to <I>Töpf</I>, and <I>Kuh</I> to <I>Küh</I>.
|
|
</P>
|
|
<A NAME="toc65"></A>
|
|
<H2>Prefix-dependent choices</H2>
|
|
<P>
|
|
Sometimes a token has different forms depending on the token
|
|
that follows. An example is the English indefinite article,
|
|
which is <I>an</I> if a vowel follows, <I>a</I> otherwise.
|
|
Which form is chosen can only be decided at run time, i.e.
|
|
when a string is actually build. GF has a special construct for
|
|
such tokens, the <CODE>pre</CODE> construct exemplified in
|
|
</P>
|
|
<PRE>
|
|
oper artIndef : Str =
|
|
pre {"a" ; "an" / strs {"a" ; "e" ; "i" ; "o"}} ;
|
|
</PRE>
|
|
<P>
|
|
Thus
|
|
</P>
|
|
<PRE>
|
|
artIndef ++ "cheese" ---> "a" ++ "cheese"
|
|
artIndef ++ "apple" ---> "an" ++ "apple"
|
|
</PRE>
|
|
<P>
|
|
This very example does not work in all situations: the prefix
|
|
<I>u</I> has no general rules, and some problematic words are
|
|
<I>euphemism, one-eyed, n-gram</I>. It is possible to write
|
|
</P>
|
|
<PRE>
|
|
oper artIndef : Str =
|
|
pre {"a" ;
|
|
"a" / strs {"eu" ; "one"} ;
|
|
"an" / strs {"a" ; "e" ; "i" ; "o" ; "n-"}
|
|
} ;
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc66"></A>
|
|
<H2>Predefined types</H2>
|
|
<P>
|
|
GF has the following predefined categories in abstract syntax:
|
|
</P>
|
|
<PRE>
|
|
cat Int ; -- integers, e.g. 0, 5, 743145151019
|
|
cat Float ; -- floats, e.g. 0.0, 3.1415926
|
|
cat String ; -- strings, e.g. "", "foo", "123"
|
|
</PRE>
|
|
<P>
|
|
The objects of each of these categories are <B>literals</B>
|
|
as indicated in the comments above. No <CODE>fun</CODE> definition
|
|
can have a predefined category as its value type, but
|
|
they can be used as arguments. For example:
|
|
</P>
|
|
<PRE>
|
|
fun StreetAddress : Int -> String -> Address ;
|
|
lin StreetAddress number street = {s = number.s ++ street.s} ;
|
|
|
|
-- e.g. (StreetAddress 10 "Downing Street") : Address
|
|
</PRE>
|
|
<P>
|
|
FIXME: The linearization type is <CODE>{s : Str}</CODE> for all these categories.
|
|
</P>
|
|
<A NAME="toc67"></A>
|
|
<H1>Using the resource grammar library</H1>
|
|
<P>
|
|
In this chapter, we will take a look at the GF resource grammar library.
|
|
We will use the library to implement a slightly extended <CODE>Food</CODE> grammar
|
|
and port it to some new languages.
|
|
</P>
|
|
<A NAME="toc68"></A>
|
|
<H2>The coverage of the library</H2>
|
|
<P>
|
|
The GF Resource Grammar Library contains grammar rules for
|
|
10 languages (in addition, 2 languages are available as incomplete
|
|
implementations, and a few more are under construction). Its purpose
|
|
is to make these rules available for application programmers,
|
|
who can thereby concentrate on the semantic and stylistic
|
|
aspects of their grammars, without having to think about
|
|
grammaticality. The targeted level of application grammarians
|
|
is that of a skilled programmer with
|
|
a practical knowledge of the target languages, but without
|
|
theoretical knowledge about their grammars.
|
|
Such a combination of
|
|
skills is typical of programmers who, for instance, want to localize
|
|
software to new languages.
|
|
</P>
|
|
<P>
|
|
The current resource languages are
|
|
</P>
|
|
<UL>
|
|
<LI><CODE>Ara</CODE>bic (incomplete)
|
|
<LI><CODE>Cat</CODE>alan (incomplete)
|
|
<LI><CODE>Dan</CODE>ish
|
|
<LI><CODE>Eng</CODE>lish
|
|
<LI><CODE>Fin</CODE>nish
|
|
<LI><CODE>Fre</CODE>nch
|
|
<LI><CODE>Ger</CODE>man
|
|
<LI><CODE>Ita</CODE>lian
|
|
<LI><CODE>Nor</CODE>wegian
|
|
<LI><CODE>Rus</CODE>sian
|
|
<LI><CODE>Spa</CODE>nish
|
|
<LI><CODE>Swe</CODE>dish
|
|
</UL>
|
|
|
|
<P>
|
|
The first three letters (<CODE>Eng</CODE> etc) are used in grammar module names.
|
|
The incomplete Arabic and Catalan implementations are
|
|
enough to be used in many applications; they both contain, amoung other
|
|
things, complete inflectional morphology.
|
|
</P>
|
|
<A NAME="toc69"></A>
|
|
<H2>The resource API</H2>
|
|
<P>
|
|
The resource library API is devided into language-specific
|
|
and language-independent parts. To put it roughly,
|
|
</P>
|
|
<UL>
|
|
<LI>the syntax API is language-independent, i.e. has the same types and functions for all
|
|
languages.
|
|
Its name is <CODE>Syntax</CODE><I>L</I> for each language <I>L</I>
|
|
<LI>the morphology API is language-specific, i.e. has partly different types and functions
|
|
for different languages.
|
|
Its name is <CODE>Paradigms</CODE><I>L</I> for each language <I>L</I>
|
|
</UL>
|
|
|
|
<P>
|
|
A full documentation of the API is available on-line in the
|
|
<A HREF="../../lib/resource-1.0/synopsis.html">resource synopsis</A>. For our
|
|
examples, we will only need a fragment of the full API.
|
|
</P>
|
|
<P>
|
|
In the first examples,
|
|
we will make use of the following categories, from the module <CODE>Syntax</CODE>.
|
|
</P>
|
|
<TABLE CELLPADDING="4" BORDER="1">
|
|
<TR>
|
|
<TH>Category</TH>
|
|
<TH>Explanation</TH>
|
|
<TH COLSPAN="2">Example</TH>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>Utt</CODE></TD>
|
|
<TD>sentence, question, word...</TD>
|
|
<TD>"be quiet"</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>Adv</CODE></TD>
|
|
<TD>verb-phrase-modifying adverb,</TD>
|
|
<TD>"in the house"</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>AdA</CODE></TD>
|
|
<TD>adjective-modifying adverb,</TD>
|
|
<TD>"very"</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>S</CODE></TD>
|
|
<TD>declarative sentence</TD>
|
|
<TD>"she lived here"</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>Cl</CODE></TD>
|
|
<TD>declarative clause, with all tenses</TD>
|
|
<TD>"she looks at this"</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>AP</CODE></TD>
|
|
<TD>adjectival phrase</TD>
|
|
<TD>"very warm"</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>CN</CODE></TD>
|
|
<TD>common noun (without determiner)</TD>
|
|
<TD>"red house"</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>NP</CODE></TD>
|
|
<TD>noun phrase (subject or object)</TD>
|
|
<TD>"the red house"</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>Det</CODE></TD>
|
|
<TD>determiner phrase</TD>
|
|
<TD>"those seven"</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>Predet</CODE></TD>
|
|
<TD>predeterminer</TD>
|
|
<TD>"only"</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>Quant</CODE></TD>
|
|
<TD>quantifier with both sg and pl</TD>
|
|
<TD>"this/these"</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>Prep</CODE></TD>
|
|
<TD>preposition, or just case</TD>
|
|
<TD>"in"</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>A</CODE></TD>
|
|
<TD>one-place adjective</TD>
|
|
<TD>"warm"</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>N</CODE></TD>
|
|
<TD>common noun</TD>
|
|
<TD>"house"</TD>
|
|
</TR>
|
|
</TABLE>
|
|
|
|
<P></P>
|
|
<P>
|
|
We will need the following syntax rules from <CODE>Syntax</CODE>.
|
|
</P>
|
|
<TABLE CELLPADDING="4" BORDER="1">
|
|
<TR>
|
|
<TH>Function</TH>
|
|
<TH>Type</TH>
|
|
<TH COLSPAN="2">Example</TH>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkUtt</CODE></TD>
|
|
<TD><CODE>S -> Utt</CODE></TD>
|
|
<TD><I>John walked</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkUtt</CODE></TD>
|
|
<TD><CODE>Cl -> Utt</CODE></TD>
|
|
<TD><I>John walks</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkCl</CODE></TD>
|
|
<TD><CODE>NP -> AP -> Cl</CODE></TD>
|
|
<TD><I>John is very old</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkNP</CODE></TD>
|
|
<TD><CODE>Det -> CN -> NP</CODE></TD>
|
|
<TD><I>the first old man</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkNP</CODE></TD>
|
|
<TD><CODE>Predet -> NP -> NP</CODE></TD>
|
|
<TD><I>only John</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkDet</CODE></TD>
|
|
<TD><CODE>Quant -> Det</CODE></TD>
|
|
<TD><I>this</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkCN</CODE></TD>
|
|
<TD><CODE>N -> CN</CODE></TD>
|
|
<TD><I>house</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkCN</CODE></TD>
|
|
<TD><CODE>AP -> CN -> CN</CODE></TD>
|
|
<TD><I>very big blue house</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkAP</CODE></TD>
|
|
<TD><CODE>A -> AP</CODE></TD>
|
|
<TD><I>old</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkAP</CODE></TD>
|
|
<TD><CODE>AdA -> AP -> AP</CODE></TD>
|
|
<TD><I>very very old</I></TD>
|
|
</TR>
|
|
</TABLE>
|
|
|
|
<P></P>
|
|
<P>
|
|
We will also need the following structural words from <CODE>Syntax</CODE>.
|
|
</P>
|
|
<TABLE CELLPADDING="4" BORDER="1">
|
|
<TR>
|
|
<TH>Function</TH>
|
|
<TH>Type</TH>
|
|
<TH COLSPAN="2">Example</TH>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>all_Predet</CODE></TD>
|
|
<TD><CODE>Predet</CODE></TD>
|
|
<TD><I>all</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>defPlDet</CODE></TD>
|
|
<TD><CODE>Det</CODE></TD>
|
|
<TD><I>the (houses)</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>this_Quant</CODE></TD>
|
|
<TD><CODE>Quant</CODE></TD>
|
|
<TD><I>this</I></TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>very_AdA</CODE></TD>
|
|
<TD><CODE>AdA</CODE></TD>
|
|
<TD><I>very</I></TD>
|
|
</TR>
|
|
</TABLE>
|
|
|
|
<P></P>
|
|
<P>
|
|
For French, we will use the following part of <CODE>ParadigmsFre</CODE>.
|
|
</P>
|
|
<TABLE CELLPADDING="4" BORDER="1">
|
|
<TR>
|
|
<TH>Function</TH>
|
|
<TH>Type</TH>
|
|
<TH COLSPAN="2">Example</TH>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>Gender</CODE></TD>
|
|
<TD><CODE>Type</CODE></TD>
|
|
<TD>-</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>masculine</CODE></TD>
|
|
<TD><CODE>Gender</CODE></TD>
|
|
<TD>-</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>feminine</CODE></TD>
|
|
<TD><CODE>Gender</CODE></TD>
|
|
<TD>-</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkN</CODE></TD>
|
|
<TD><CODE>(cheval : Str) -> N</CODE></TD>
|
|
<TD>-</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkN</CODE></TD>
|
|
<TD><CODE>(foie : Str) -> Gender -> N</CODE></TD>
|
|
<TD>-</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkA</CODE></TD>
|
|
<TD><CODE>(cher : Str) -> A</CODE></TD>
|
|
<TD>-</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkA</CODE></TD>
|
|
<TD><CODE>(sec,seche : Str) -> A</CODE></TD>
|
|
<TD>-</TD>
|
|
</TR>
|
|
</TABLE>
|
|
|
|
<P></P>
|
|
<P>
|
|
For German, we will use the following part of <CODE>ParadigmsGer</CODE>.
|
|
</P>
|
|
<TABLE CELLPADDING="4" BORDER="1">
|
|
<TR>
|
|
<TH>Function</TH>
|
|
<TH>Type</TH>
|
|
<TH COLSPAN="2">Example</TH>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>Gender</CODE></TD>
|
|
<TD><CODE>Type</CODE></TD>
|
|
<TD>-</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>masculine</CODE></TD>
|
|
<TD><CODE>Gender</CODE></TD>
|
|
<TD>-</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>feminine</CODE></TD>
|
|
<TD><CODE>Gender</CODE></TD>
|
|
<TD>-</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>neuter</CODE></TD>
|
|
<TD><CODE>Gender</CODE></TD>
|
|
<TD>-</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkN</CODE></TD>
|
|
<TD><CODE>(Stufe : Str) -> N</CODE></TD>
|
|
<TD>-</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkN</CODE></TD>
|
|
<TD><CODE>(Bild,Bilder : Str) -> Gender -> N</CODE></TD>
|
|
<TD>-</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkA</CODE></TD>
|
|
<TD><CODE>Str -> A</CODE></TD>
|
|
<TD>-</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD><CODE>mkA</CODE></TD>
|
|
<TD><CODE>(gut,besser,beste : Str) -> A</CODE></TD>
|
|
<TD><I>gut,besser,beste</I></TD>
|
|
</TR>
|
|
</TABLE>
|
|
|
|
<P></P>
|
|
<P>
|
|
<B>Exercise</B>. Try out the morphological paradigms in different languages. Do
|
|
in this way:
|
|
</P>
|
|
<PRE>
|
|
> i -path=alltenses:prelude -retain alltenses/ParadigmsGer.gfr
|
|
> cc mkN "Farbe"
|
|
> cc mkA "gut" "besser" "beste"
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc70"></A>
|
|
<H2>Example: French</H2>
|
|
<P>
|
|
We start with an abstract syntax that is like <CODE>Food</CODE> before, but
|
|
has a plural determiner (<I>all wines</I>) and some new nouns that will
|
|
need different genders in most languages.
|
|
</P>
|
|
<PRE>
|
|
abstract Food = {
|
|
cat
|
|
S ; Item ; Kind ; Quality ;
|
|
fun
|
|
Is : Item -> Quality -> S ;
|
|
This, All : Kind -> Item ;
|
|
QKind : Quality -> Kind -> Kind ;
|
|
Wine, Cheese, Fish, Beer, Pizza : Kind ;
|
|
Very : Quality -> Quality ;
|
|
Fresh, Warm, Italian, Expensive, Delicious, Boring : Quality ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
The French implementation opens <CODE>SyntaxFre</CODE> and <CODE>ParadigmsFre</CODE>
|
|
to get access to the resource libraries needed. In order to find
|
|
the libraries, a <CODE>path</CODE> directive is prepended; it is interpreted
|
|
relative to the environment variable <CODE>GF_LIB_PATH</CODE>.
|
|
</P>
|
|
<PRE>
|
|
--# -path=.:present:prelude
|
|
|
|
concrete FoodFre of Food = open SyntaxFre,ParadigmsFre in {
|
|
lincat
|
|
S = Utt ;
|
|
Item = NP ;
|
|
Kind = CN ;
|
|
Quality = AP ;
|
|
lin
|
|
Is item quality = mkUtt (mkCl item quality) ;
|
|
This kind = mkNP (mkDet this_Quant) kind ;
|
|
All kind = mkNP all_Predet (mkNP defPlDet kind) ;
|
|
QKind quality kind = mkCN quality kind ;
|
|
Wine = mkCN (mkN "vin") ;
|
|
Beer = mkCN (mkN "bière") ;
|
|
Pizza = mkCN (mkN "pizza" feminine) ;
|
|
Cheese = mkCN (mkN "fromage" masculine) ;
|
|
Fish = mkCN (mkN "poisson") ;
|
|
Very quality = mkAP very_AdA quality ;
|
|
Fresh = mkAP (mkA "frais" "fraîche") ;
|
|
Warm = mkAP (mkA "chaud") ;
|
|
Italian = mkAP (mkA "italien") ;
|
|
Expensive = mkAP (mkA "cher") ;
|
|
Delicious = mkAP (mkA "délicieux") ;
|
|
Boring = mkAP (mkA "ennuyeux") ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
The <CODE>lincat</CODE> definitions in <CODE>FoodFre</CODE> assign <B>resource categories</B>
|
|
to <B>application categories</B>. In a sense, the application categories
|
|
are <B>semantic</B>, as they correspond to concepts in the grammar application,
|
|
whereas the resource categories are <B>syntactic</B>: they give the linguistic
|
|
means to express concepts in any application.
|
|
</P>
|
|
<P>
|
|
The <CODE>lin</CODE> definitions likewise assign resource functions to application
|
|
functions. Under the hood, there is a lot of matching with parameters to
|
|
take care of word order, inflection, and agreement. But the user of the
|
|
library sees nothing of this: the only parameters you need to give are
|
|
the genders of some nouns, which cannot be correctly inferred from the word.
|
|
</P>
|
|
<P>
|
|
In French, for example, the one-argument <CODE>mkN</CODE> assigns the noun the feminine
|
|
gender if and only if it ends with an <I>e</I>. Therefore the words <I>fromage</I> and
|
|
<I>pizza</I> are given genders. One can of course always give genders manually, to
|
|
be on the safe side.
|
|
</P>
|
|
<P>
|
|
As for inflection, the one-argument adjective pattern <CODE>mkA</CODE> takes care of
|
|
completely regular adjective such as <I>chaud-chaude</I>, but also of special
|
|
cases such as <I>italien-italienne</I>, <I>cher-chère</I>, and <I>délicieux-délicieuse</I>.
|
|
But it cannot form <I>frais-fraîche</I> properly. Once again, you can give more
|
|
forms to be on the safe side. You can also test the paradigms in the GF
|
|
program.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Compile the grammar <CODE>FoodFre</CODE> and generate and parse some sentences.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Write a concrete syntax of <CODE>Food</CODE> for English or some other language
|
|
included in the resource library. You can also compare the output with the hand-written
|
|
grammars presented earlier in this tutorial.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. In particular, try to write a concrete syntax for Italian, even if
|
|
you don't know Italian. What you need to know is that "beer" is <I>birra</I> and
|
|
"pizza" is <I>pizza</I>, and that all the nouns and adjectives in the grammar
|
|
are regular.
|
|
</P>
|
|
<A NAME="toc71"></A>
|
|
<H2>Functor implementation of multilingual grammars</H2>
|
|
<P>
|
|
If you did the exercise of writing a concrete syntax of <CODE>Food</CODE> for some other
|
|
language, you probably noticed that much of the code looks exactly the same
|
|
as for French. The immediate reason for this is that the <CODE>Syntax</CODE> API is the
|
|
same for all languages; the deeper reason is that all languages (at least those
|
|
in the resource package) implement the same syntactic structures and tend to use them
|
|
in similar ways. Thus it is only the lexical parts of a concrete syntax that
|
|
you need to write anew for a new language. In brief,
|
|
</P>
|
|
<UL>
|
|
<LI>first copy the concrete syntax for one language
|
|
<LI>then change the words (the strings and perhaps some paradigms)
|
|
</UL>
|
|
|
|
<P>
|
|
But programming by copy-and-paste is not worthy of a functional programmer.
|
|
Can we write a function that takes care of the shared parts of grammar modules?
|
|
Yes, we can. It is not a function in the <CODE>fun</CODE> or <CODE>oper</CODE> sense, but
|
|
a function operating on modules, called a <B>functor</B>. This construct
|
|
is familiar from the functional languages ML and OCaml, but it does not
|
|
exist in Haskell. It also bears some resemblance to templates in C++.
|
|
Functors are also known as <B>parametrized modules</B>.
|
|
</P>
|
|
<P>
|
|
In GF, a functor is a module that <CODE>open</CODE>s one or more <B>interfaces</B>.
|
|
An <CODE>interface</CODE> is a module similar to a <CODE>resource</CODE>, but it only
|
|
contains the types of <CODE>oper</CODE>s, not their definitions. You can think
|
|
of an interface as a kind of a record type. Thus a functor is a kind
|
|
of a function taking records as arguments and producins a module
|
|
as value.
|
|
</P>
|
|
<P>
|
|
Let us look at a functor implementation of the <CODE>Food</CODE> grammar.
|
|
Consider its module header first:
|
|
</P>
|
|
<PRE>
|
|
incomplete concrete FoodI of Food = open Syntax, LexFood in
|
|
</PRE>
|
|
<P>
|
|
In the functor-function analogy, <CODE>FoodI</CODE> would be presented as a function
|
|
with the following type signature:
|
|
</P>
|
|
<PRE>
|
|
FoodI : instance of Syntax -> instance of LexFood -> concrete of Food
|
|
</PRE>
|
|
<P>
|
|
It takes as arguments two interfaces:
|
|
</P>
|
|
<UL>
|
|
<LI><CODE>Syntax</CODE>, the resource grammar interface
|
|
<LI><CODE>LexFood</CODE>, the domain-specific lexicon interface
|
|
</UL>
|
|
|
|
<P>
|
|
Functors opening <CODE>Syntax</CODE> and a domain lexicon interface are in fact
|
|
so typical in GF applications, that this structure could be called a <B>design patter</B>
|
|
for GF grammars. The idea in this pattern is, again, that
|
|
the languages use the same syntactic structures but different words.
|
|
</P>
|
|
<P>
|
|
Before going to the details of the module bodies, let us look at how functors
|
|
are concretely used. An interface has a header such as
|
|
</P>
|
|
<PRE>
|
|
interface LexFood = open Syntax in
|
|
</PRE>
|
|
<P>
|
|
To give an <CODE>instance</CODE> of it means that all <CODE>oper</CODE>s are given definitione (of
|
|
appropriate types). For example,
|
|
</P>
|
|
<PRE>
|
|
instance LexFoodGer of LexFood = open SyntaxGer, ParadigmsGer in
|
|
</PRE>
|
|
<P>
|
|
Notice that when an interface opens an interface, such as <CODE>Syntax</CODE>, then its instance
|
|
opens an instance of it. But the instance may also open some resources - typically,
|
|
a domain lexicon instance opens a <CODE>Paradigms</CODE> module.
|
|
</P>
|
|
<P>
|
|
In the function-functor analogy, we now have
|
|
</P>
|
|
<PRE>
|
|
SyntaxGer : instance of Syntax
|
|
LexFoodGer : instance of LexFood
|
|
</PRE>
|
|
<P>
|
|
Thus we can complete the German implementation by "applying" the functor:
|
|
</P>
|
|
<PRE>
|
|
FoodI SyntaxGer LexFoodGer : concrete of Food
|
|
</PRE>
|
|
<P>
|
|
The GF syntax for doing so is
|
|
</P>
|
|
<PRE>
|
|
concrete FoodGer of Food = FoodI with
|
|
(Syntax = SyntaxGer),
|
|
(LexFood = LexFoodGer) ;
|
|
</PRE>
|
|
<P>
|
|
Notice that this is the <I>complete</I> module, not just a header of it.
|
|
The module body is received from <CODE>FoodI</CODE>, by instantiating the
|
|
interface constants with their definitions given in the German
|
|
instances.
|
|
</P>
|
|
<P>
|
|
A module of this form, characterized by the keyword <CODE>with</CODE>, is
|
|
called a <B>functor instantiation</B>.
|
|
</P>
|
|
<P>
|
|
Here is the complete code for the functor <CODE>FoodI</CODE>:
|
|
</P>
|
|
<PRE>
|
|
incomplete concrete FoodI of Food = open Syntax, LexFood in {
|
|
lincat
|
|
S = Utt ;
|
|
Item = NP ;
|
|
Kind = CN ;
|
|
Quality = AP ;
|
|
lin
|
|
Is item quality = mkUtt (mkCl item quality) ;
|
|
This kind = mkNP (mkDet this_Quant) kind ;
|
|
All kind = mkNP all_Predet (mkNP defPlDet kind) ;
|
|
QKind quality kind = mkCN quality kind ;
|
|
Wine = mkCN wine_N ;
|
|
Beer = mkCN beer_N ;
|
|
Pizza = mkCN pizza_N ;
|
|
Cheese = mkCN cheese_N ;
|
|
Fish = mkCN fish_N ;
|
|
Very quality = mkAP very_AdA quality ;
|
|
Fresh = mkAP fresh_A ;
|
|
Warm = mkAP warm_A ;
|
|
Italian = mkAP italian_A ;
|
|
Expensive = mkAP expensive_A ;
|
|
Delicious = mkAP delicious_A ;
|
|
Boring = mkAP boring_A ;
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc72"></A>
|
|
<H2>Interfaces and instances</H2>
|
|
<P>
|
|
Let us now define the <CODE>LexFood</CODE> interface:
|
|
</P>
|
|
<PRE>
|
|
interface LexFood = open Syntax in {
|
|
oper
|
|
wine_N : N ;
|
|
beer_N : N ;
|
|
pizza_N : N ;
|
|
cheese_N : N ;
|
|
fish_N : N ;
|
|
fresh_A : A ;
|
|
warm_A : A ;
|
|
italian_A : A ;
|
|
expensive_A : A ;
|
|
delicious_A : A ;
|
|
boring_A : A ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
In this interface, only lexical items are declared. In general, an
|
|
interface can declare any functions and also types. The <CODE>Syntax</CODE>
|
|
interface does so.
|
|
</P>
|
|
<P>
|
|
Here is the German instance of the interface:
|
|
</P>
|
|
<PRE>
|
|
instance LexFoodGer of LexFood = open SyntaxGer, ParadigmsGer in {
|
|
oper
|
|
wine_N = mkN "Wein" ;
|
|
beer_N = mkN "Bier" "Biere" neuter ;
|
|
pizza_N = mkN "Pizza" "Pizzen" feminine ;
|
|
cheese_N = mkN "Käse" "Käsen" masculine ;
|
|
fish_N = mkN "Fisch" ;
|
|
fresh_A = mkA "frisch" ;
|
|
warm_A = mkA "warm" "wärmer" "wärmste" ;
|
|
italian_A = mkA "italienisch" ;
|
|
expensive_A = mkA "teuer" ;
|
|
delicious_A = mkA "köstlich" ;
|
|
boring_A = mkA "langweilig" ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
Just to complete the picture, we repeat the German functor instantiation
|
|
for <CODE>FoodI</CODE>, this time with a path directive that makes it compilable.
|
|
</P>
|
|
<PRE>
|
|
--# -path=.:present:prelude
|
|
|
|
concrete FoodGer of Food = FoodI with
|
|
(Syntax = SyntaxGer),
|
|
(LexFood = LexFoodGer) ;
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<B>Exercise</B>. Compile and test <CODE>FoodGer</CODE>.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Refactor <CODE>FoodFre</CODE> into a functor instantiation.
|
|
</P>
|
|
<A NAME="toc73"></A>
|
|
<H2>Adding languages to a functor implementation</H2>
|
|
<P>
|
|
Once we have an application grammar defined by using a functor,
|
|
adding a new language is simple. Just two modules need to be written:
|
|
</P>
|
|
<UL>
|
|
<LI>a domain lexicon instance
|
|
<LI>a functor instantiation
|
|
</UL>
|
|
|
|
<P>
|
|
The functor instantiation is completely mechanical to write.
|
|
Here is one for Finnish:
|
|
</P>
|
|
<PRE>
|
|
--# -path=.:present:prelude
|
|
|
|
concrete FoodFin of Food = FoodI with
|
|
(Syntax = SyntaxFin),
|
|
(LexFood = LexFoodFin) ;
|
|
</PRE>
|
|
<P>
|
|
The domain lexicon instance requires some knowledge of the words of the
|
|
language: what words are used for which concepts, how the words are
|
|
inflected, plus features such as genders. Here is a lexicon instance for
|
|
Finnish:
|
|
</P>
|
|
<PRE>
|
|
instance LexFoodFin of LexFood = open SyntaxFin, ParadigmsFin in {
|
|
oper
|
|
wine_N = mkN "viini" ;
|
|
beer_N = mkN "olut" ;
|
|
pizza_N = mkN "pizza" ;
|
|
cheese_N = mkN "juusto" ;
|
|
fish_N = mkN "kala" ;
|
|
fresh_A = mkA "tuore" ;
|
|
warm_A = mkA "lämmin" ;
|
|
italian_A = mkA "italialainen" ;
|
|
expensive_A = mkA "kallis" ;
|
|
delicious_A = mkA "herkullinen" ;
|
|
boring_A = mkA "tylsä" ;
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<B>Exercise</B>. Instantiate the functor <CODE>FoodI</CODE> to some language of
|
|
your choice.
|
|
</P>
|
|
<A NAME="toc74"></A>
|
|
<H2>Division of labour revisited</H2>
|
|
<P>
|
|
One purpose with the resource grammars was stated to be a division
|
|
of labour between linguists and application grammarians. We can now
|
|
reflect on what this means more precisely, by asking ourselves what
|
|
skills are required of grammarians working on different components.
|
|
</P>
|
|
<P>
|
|
Building a GF application starts from the abstract syntax. Writing
|
|
an abstract syntax requires
|
|
</P>
|
|
<UL>
|
|
<LI>understanding the semantic structure of the application domain
|
|
<LI>knowledge of the GF fragment with categories and functions
|
|
</UL>
|
|
|
|
<P>
|
|
If the concrete syntax is written by means of a functor, the programmer
|
|
has to decide what parts of the implementation are put to the interface
|
|
and what parts are shared in the functor. This requires
|
|
</P>
|
|
<UL>
|
|
<LI>knowing how the domain concepts are expressed in natural language
|
|
<LI>knowledge of the resource grammar library - the categories and combinators
|
|
<LI>understanding what parts are likely to be expressed in language-dependent
|
|
ways, so that they must belong to the interface and not the functor
|
|
<LI>knowledge of the GF fragment with function applications and strings
|
|
</UL>
|
|
|
|
<P>
|
|
Instantiating a ready-made functor to a new language is less demanding.
|
|
It requires essentially
|
|
</P>
|
|
<UL>
|
|
<LI>knowing how the domain words are expressed in the language
|
|
<LI>knowing, roughly, how these words are inflected
|
|
<LI>knowledge of the paradigms available in the library
|
|
<LI>knowledge of the GF fragment with function applications and strings
|
|
</UL>
|
|
|
|
<P>
|
|
Notice that none of these tasks requires the use of GF records, tables,
|
|
or parameters. Thus only a small fragment of GF is needed; the rest of
|
|
GF is only relevant for those who write the libraries.
|
|
</P>
|
|
<P>
|
|
Of course, grammar writing is not always straightforward usage of libraries.
|
|
For example, GF can be used for other languages than just those in the
|
|
libraries - for both natural and formal languages. A knowledge of records
|
|
and tables can, unfortunately, also be needed for understanding GF's error
|
|
messages.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Design a small grammar that can be used for controlling
|
|
an MP3 player. The grammar should be able to recognize commands such
|
|
as <I>play this song</I>, with the following variations:
|
|
</P>
|
|
<UL>
|
|
<LI>verbs: <I>play</I>, <I>remove</I>
|
|
<LI>objects: <I>song</I>, <I>artist</I>
|
|
<LI>determiners: <I>this</I>, <I>the previous</I>
|
|
<LI>verbs without arguments: <I>stop</I>, <I>pause</I>
|
|
</UL>
|
|
|
|
<P>
|
|
The implementation goes in the following phases:
|
|
</P>
|
|
<OL>
|
|
<LI>abstract syntax
|
|
<LI>functor and lexicon interface
|
|
<LI>lexicon instance for the first language
|
|
<LI>functor instantiation for the first language
|
|
<LI>lexicon instance for the second language
|
|
<LI>functor instantiation for the second language
|
|
<LI>...
|
|
</OL>
|
|
|
|
<A NAME="toc75"></A>
|
|
<H2>Restricted inheritance</H2>
|
|
<P>
|
|
A functor implementation using the resource <CODE>Syntax</CODE> interface
|
|
works as long as all concepts are expressed by using the same structures
|
|
in all languages. If this is not the case, the deviant linearization can
|
|
be made into a parameter and moved to the domain lexicon interface.
|
|
</P>
|
|
<P>
|
|
Let us take a slightly contrived example: assume that English has
|
|
no word for <CODE>Pizza</CODE>, but has to use the paraphrase <I>Italian pie</I>.
|
|
This paraphrase is no longer a noun <CODE>N</CODE>, but a complex phrase
|
|
in the category <CODE>CN</CODE>. An obvious way to solve this problem is
|
|
to change interface <CODE>LexEng</CODE> so that the constant declared for
|
|
<CODE>Pizza</CODE> gets a new type:
|
|
</P>
|
|
<PRE>
|
|
oper pizza_CN : CN ;
|
|
</PRE>
|
|
<P>
|
|
But this solution is unstable: we may end up changing the interface
|
|
and the function with each new language, and we must every time also
|
|
change the interface instances for the old languages to maintain
|
|
type correctness.
|
|
</P>
|
|
<P>
|
|
A better solution is to use <B>restricted inheritance</B>: the English
|
|
instantiation inherits the functor implementation except for the
|
|
constant <CODE>Pizza</CODE>. This is how we write:
|
|
</P>
|
|
<PRE>
|
|
--# -path=.:present:prelude
|
|
|
|
concrete FoodEng of Food = FoodI - [Pizza] with
|
|
(Syntax = SyntaxEng),
|
|
(LexFood = LexFoodEng) **
|
|
open SyntaxEng, ParadigmsEng in {
|
|
|
|
lin Pizza = mkCN (mkA "Italian") (mkN "pie") ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
Restricted inheritance is available for all inherited modules. One can for
|
|
instance exclude some mushrooms and pick up just some fruit in
|
|
the <CODE>FoodMarket</CODE> example:
|
|
</P>
|
|
<PRE>
|
|
abstract Foodmarket = Food, Fruit [Peach], Mushroom - [Agaric]
|
|
</PRE>
|
|
<P>
|
|
A concrete syntax of <CODE>Foodmarket</CODE> must then indicate the same inheritance
|
|
restrictions.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Change <CODE>FoodGer</CODE> in such a way that it says, instead of
|
|
<I>X is Y</I>, the equivalent of <I>X must be Y</I> (<I>X muss Y sein</I>).
|
|
You will have to browse the full resource API to find all
|
|
the functions needed.
|
|
</P>
|
|
<A NAME="toc76"></A>
|
|
<H2>Browsing the resource with GF commands</H2>
|
|
<P>
|
|
In addition to reading the
|
|
<A HREF="../../lib/resource-1.0/synopsis.html">resource synopsis</A>, you
|
|
can find resource function combinations by using the parser. This
|
|
is so because the resource library is in the end implemented as
|
|
a top-level <CODE>abstract-concrete</CODE> grammar, on which parsing
|
|
and linearization work.
|
|
</P>
|
|
<P>
|
|
Unfortunately, only English and the Scandinavian languages can be
|
|
parsed within acceptable computer resource limits when the full
|
|
resource is used.
|
|
</P>
|
|
<P>
|
|
To look for a syntax tree in the overload API by parsing, do like this:
|
|
</P>
|
|
<PRE>
|
|
> $GF_LIB_PATH
|
|
> i -path=alltenses:prelude alltenses/OverLangEng.gfc
|
|
> p -cat=S -overload "this grammar is too big"
|
|
mkS (mkCl (mkNP (mkDet this_Quant) grammar_N) (mkAP too_AdA big_A))
|
|
</PRE>
|
|
<P>
|
|
To view linearizations in all languages by parsing from English:
|
|
</P>
|
|
<PRE>
|
|
> i alltenses/langs.gfcm
|
|
> p -cat=S -lang=LangEng "this grammar is too big" | tb
|
|
UseCl TPres ASimul PPos (PredVP (DetCN (DetSg (SgQuant this_Quant)
|
|
NoOrd) (UseN grammar_N)) (UseComp (CompAP (AdAP too_AdA (PositA big_A)))))
|
|
Den här grammatiken är för stor
|
|
Esta gramática es demasiado grande
|
|
(Cyrillic: eta grammatika govorit des'at' jazykov)
|
|
Denne grammatikken er for stor
|
|
Questa grammatica è troppo grande
|
|
Diese Grammatik ist zu groß
|
|
Cette grammaire est trop grande
|
|
Tämä kielioppi on liian suuri
|
|
This grammar is too big
|
|
Denne grammatik er for stor
|
|
</PRE>
|
|
<P>
|
|
Unfortunately, the Russian grammar uses at the moment a different
|
|
character encoding than the rest and is therefore not displayed correctly
|
|
in a terminal window. However, the GF syntax editor does display all
|
|
examples correctly:
|
|
</P>
|
|
<PRE>
|
|
% gfeditor alltenses/langs.gfcm
|
|
</PRE>
|
|
<P>
|
|
When you have constructed the tree, you will see the following screen:
|
|
</P>
|
|
<P>
|
|
<center>
|
|
</P>
|
|
<P>
|
|
<IMG ALIGN="right" SRC="../../lib/resource-1.0/doc/10lang-small.png" BORDER="0" ALT="">
|
|
</P>
|
|
<P>
|
|
</center>
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Find the resource grammar translations for the following
|
|
English phrases (parse in the category <CODE>Phr</CODE>). You can first try to
|
|
build the terms manually.
|
|
</P>
|
|
<P>
|
|
<I>every man loves a woman</I>
|
|
</P>
|
|
<P>
|
|
<I>this grammar speaks more than ten languages</I>
|
|
</P>
|
|
<P>
|
|
<I>which languages aren't in the grammar</I>
|
|
</P>
|
|
<P>
|
|
<I>which languages did you want to speak</I>
|
|
</P>
|
|
<A NAME="toc77"></A>
|
|
<H1>More concepts of abstract syntax</H1>
|
|
<A NAME="toc78"></A>
|
|
<H2>GF as a logical framework</H2>
|
|
<P>
|
|
In this section, we will show how
|
|
to encode advanced semantic concepts in an abstract syntax.
|
|
We use concepts inherited from <B>type theory</B>. Type theory
|
|
is the basis of many systems known as <B>logical frameworks</B>, which are
|
|
used for representing mathematical theorems and their proofs on a computer.
|
|
In fact, GF has a logical framework as its proper part:
|
|
this part is the abstract syntax.
|
|
</P>
|
|
<P>
|
|
In a logical framework, the formalization of a mathematical theory
|
|
is a set of type and function declarations. The following is an example
|
|
of such a theory, represented as an <CODE>abstract</CODE> module in GF.
|
|
</P>
|
|
<PRE>
|
|
abstract Arithm = {
|
|
cat
|
|
Prop ; -- proposition
|
|
Nat ; -- natural number
|
|
fun
|
|
Zero : Nat ; -- 0
|
|
Succ : Nat -> Nat ; -- successor of x
|
|
Even : Nat -> Prop ; -- x is even
|
|
And : Prop -> Prop -> Prop ; -- A and B
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<B>Exercise</B>. Give a concrete syntax of <CODE>Arithm</CODE>, either from scatch or
|
|
by using the resource library.
|
|
</P>
|
|
<A NAME="toc79"></A>
|
|
<H2>Dependent types</H2>
|
|
<P>
|
|
<B>Dependent types</B> are a characteristic feature of GF,
|
|
inherited from the <B>constructive type theory</B> of Martin-Löf and
|
|
distinguishing GF from most other grammar formalisms and
|
|
functional programming languages.
|
|
</P>
|
|
<P>
|
|
Dependent types can be used for stating stronger
|
|
<B>conditions of well-formedness</B> than ordinary types.
|
|
A simple example is a "smart house" system, which
|
|
defines voice commands for household appliances. This example
|
|
is borrowed from the
|
|
<A HREF="http://cslipublications.stanford.edu/site/1575865262.html">Regulus Book</A>
|
|
(Rayner & al. 2006).
|
|
</P>
|
|
<P>
|
|
One who enters a smart house can use speech to dim lights, switch
|
|
on the fan, etc. For each <CODE>Kind</CODE> of a device, there is a set of
|
|
<CODE>Actions</CODE> that can be performed on it; thus one can dim the lights but
|
|
not the fan, for example. These dependencies can be expressed by
|
|
by making the type <CODE>Action</CODE> dependent on <CODE>Kind</CODE>. We express this
|
|
as follows in <CODE>cat</CODE> declarations:
|
|
</P>
|
|
<PRE>
|
|
cat
|
|
Command ;
|
|
Kind ;
|
|
Action Kind ;
|
|
Device Kind ;
|
|
</PRE>
|
|
<P>
|
|
The crucial use of the dependencies is made in the rule for forming commands:
|
|
</P>
|
|
<PRE>
|
|
fun CAction : (k : Kind) -> Action k -> Device k -> Command ;
|
|
</PRE>
|
|
<P>
|
|
In other words: an action and a device can be combined into a command only
|
|
if they are of the same <CODE>Kind</CODE> <CODE>k</CODE>. If we have the functions
|
|
</P>
|
|
<PRE>
|
|
DKindOne : (k : Kind) -> Device k ; -- the light
|
|
|
|
light, fan : Kind ;
|
|
dim : Action light ;
|
|
</PRE>
|
|
<P>
|
|
we can form the syntax tree
|
|
</P>
|
|
<PRE>
|
|
CAction light dim (DKindOne light)
|
|
</PRE>
|
|
<P>
|
|
but we cannot form the trees
|
|
</P>
|
|
<PRE>
|
|
CAction light dim (DKindOne fan)
|
|
CAction fan dim (DKindOne light)
|
|
CAction fan dim (DKindOne fan)
|
|
</PRE>
|
|
<P>
|
|
Linearization rules are written as usual: the concrete syntax does not
|
|
know if a category is a dependent type. In English, you can write as follows:
|
|
</P>
|
|
<PRE>
|
|
lincat Action = {s : Str} ;
|
|
lin CAction kind act dev = {s = act.s ++ dev.s} ;
|
|
</PRE>
|
|
<P>
|
|
Notice that the argument <CODE>kind</CODE> does not appear in the linearization.
|
|
The type checker will be able to reconstruct it from the <CODE>dev</CODE> argument.
|
|
</P>
|
|
<P>
|
|
Parsing with dependent types is performed in two phases:
|
|
</P>
|
|
<OL>
|
|
<LI>context-free parsing
|
|
<LI>filtering through type checker
|
|
</OL>
|
|
|
|
<P>
|
|
If you just parse in the usual way, you don't enter the second phase, and
|
|
the <CODE>kind</CODE> argument is not found:
|
|
</P>
|
|
<PRE>
|
|
> parse "dim the light"
|
|
CAction ? dim (DKindOne light)
|
|
</PRE>
|
|
<P>
|
|
Moreover, type-incorrect commands are not rejected:
|
|
</P>
|
|
<PRE>
|
|
> parse "dim the fan"
|
|
CAction ? dim (DKindOne fan)
|
|
</PRE>
|
|
<P>
|
|
The question mark <CODE>?</CODE> is a <B>metavariable</B>, and is returned by the parser
|
|
for any subtree that is suppressed by a linearization rule.
|
|
</P>
|
|
<P>
|
|
To get rid of metavariables, you must feed the parse result into the
|
|
second phase of <B>solving</B> them. The <CODE>solve</CODE> process uses the dependent
|
|
type checker to restore the values of the metavariables. It is invoked by
|
|
the command <CODE>put_tree = pt</CODE> with the flag <CODE>-transform=solve</CODE>:
|
|
</P>
|
|
<PRE>
|
|
> parse "dim the light" | put_tree -transform=solve
|
|
CAction light dim (DKindOne light)
|
|
</PRE>
|
|
<P>
|
|
The <CODE>solve</CODE> process may fail, in which case no tree is returned:
|
|
</P>
|
|
<PRE>
|
|
> parse "dim the fan" | put_tree -transform=solve
|
|
no tree found
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<B>Exercise</B>. Write an abstract syntax module with above contents
|
|
and an appropriate English concrete syntax. Try to parse the commands
|
|
<I>dim the light</I> and <I>dim the fan</I>, with and without <CODE>solve</CODE> filtering.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Perform random and exhaustive generation, with and without
|
|
<CODE>solve</CODE> filtering.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Add some device kinds and actions to the grammar.
|
|
</P>
|
|
<A NAME="toc80"></A>
|
|
<H2>Polymorphism</H2>
|
|
<P>
|
|
Sometimes an action can be performed on all kinds of devices. It would be
|
|
possible to introduce separate <CODE>fun</CODE> constants for each kind-action pair,
|
|
but this would be tedious. Instead, one can use <B>polymorphic</B> actions,
|
|
i.e. actions that take a <CODE>Kind</CODE> as an argument and produce an <CODE>Action</CODE>
|
|
for that <CODE>Kind</CODE>:
|
|
</P>
|
|
<PRE>
|
|
fun switchOn, switchOff : (k : Kind) -> Action k ;
|
|
</PRE>
|
|
<P>
|
|
Functions that are not polymorphic are <B>monomorphic</B>. However, the
|
|
dichotomy into monomorphism and full polymorphism is not always sufficien
|
|
for good semantic modelling: very typically, some actions are defined
|
|
for a proper subset of devices, but not just one. For instance, both doors and
|
|
windows can be opened, whereas lights cannot.
|
|
We will return to this problem by introducing the
|
|
concept of <B>restricted polymorphism</B> later,
|
|
after a chapter on proof objects.
|
|
</P>
|
|
<A NAME="toc81"></A>
|
|
<H2>Dependent types and spoken language models</H2>
|
|
<P>
|
|
We have used dependent types to control semantic well-formedness
|
|
in grammars. This is important in traditional type theory
|
|
applications such as proof assistants, where only mathematically
|
|
meaningful formulas should be constructed. But semantic filtering has
|
|
also proved important in speech recognition, because it reduces the
|
|
ambiguity of the results.
|
|
</P>
|
|
<A NAME="toc82"></A>
|
|
<H3>Grammar-based language models</H3>
|
|
<P>
|
|
The standard way of using GF in speech recognition is by building
|
|
<B>grammar-based language models</B>. To this end, GF comes with compilers
|
|
into several formats that are used in speech recognition systems.
|
|
One such format is GSL, used in the <A HREF="http://www.nuance.com">Nuance speech recognizer</A>.
|
|
It is produced from GF simply by printing a grammar with the flag
|
|
<CODE>-printer=gsl</CODE>.
|
|
</P>
|
|
<PRE>
|
|
> import -conversion=finite SmartEng.gf
|
|
> print_grammar -printer=gsl
|
|
|
|
;GSL2.0
|
|
; Nuance speech recognition grammar for SmartEng
|
|
; Generated by GF
|
|
|
|
.MAIN SmartEng_2
|
|
|
|
SmartEng_0 [("switch" "off") ("switch" "on")]
|
|
SmartEng_1 ["dim" ("switch" "off")
|
|
("switch" "on")]
|
|
SmartEng_2 [(SmartEng_0 SmartEng_3)
|
|
(SmartEng_1 SmartEng_4)]
|
|
SmartEng_3 ("the" SmartEng_5)
|
|
SmartEng_4 ("the" SmartEng_6)
|
|
SmartEng_5 "fan"
|
|
SmartEng_6 "light"
|
|
</PRE>
|
|
<P>
|
|
Now, GSL is a context-free format, so how does it cope with dependent types?
|
|
In general, dependent types can give rise to infinitely many basic types
|
|
(exercise!), whereas a context-free grammar can by definition only have
|
|
finitely many nonterminals.
|
|
</P>
|
|
<P>
|
|
This is where the flag <CODE>-conversion=finite</CODE> is needed in the <CODE>import</CODE>
|
|
command. Its effect is to convert a GF grammar with dependent types to
|
|
one without, so that each instance of a dependent type is replaced by
|
|
an atomic type. This can then be used as a nonterminal in a context-free
|
|
grammar. The <CODE>finite</CODE> conversion presupposes that every
|
|
dependent type has only finitely many instances, which is in fact
|
|
the case in the <CODE>Smart</CODE> grammar.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. If you have access to the Nuance speech recognizer,
|
|
test it with GF-generated language models for <CODE>SmartEng</CODE>. Do this
|
|
both with and without <CODE>-conversion=finite</CODE>.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Construct an abstract syntax with infinitely many instances
|
|
of dependent types.
|
|
</P>
|
|
<A NAME="toc83"></A>
|
|
<H3>Statistical language models</H3>
|
|
<P>
|
|
An alternative to grammar-based language models are
|
|
<B>statistical language models</B> (<B>SLM</B>s). An SLM is
|
|
built from a <B>corpus</B>, i.e. a set of utterances. It specifies the
|
|
probability of each <B>n-gram</B>, i.e. sequence of <I>n</I> words. The
|
|
typical value of <I>n</I> is 2 (bigrams) or 3 (trigrams).
|
|
</P>
|
|
<P>
|
|
One advantage of SLMs over grammar-based models is that they are
|
|
<B>robust</B>, i.e. they can be used to recognize sequences that would
|
|
be out of the grammar or the corpus. Another advantage is that
|
|
an SLM can be built "for free" if a corpus is available.
|
|
</P>
|
|
<P>
|
|
However, collecting a corpus can require a lot of work, and writing
|
|
a grammar can be less demanding, especially with tools such as GF or
|
|
Regulus. This advantage of grammars can be combined with robustness
|
|
by creating a back-up SLM from a <B>synthesized corpus</B>. This means
|
|
simply that the grammar is used for generating such a corpus.
|
|
In GF, this can be done with the <CODE>generate_trees</CODE> command.
|
|
As with grammar-based models, the quality of the SLM is better
|
|
if meaningless utterances are excluded from the corpus. Thus
|
|
a good way to generate an SLM from a GF grammar is by using
|
|
dependent types and filter the results through the type checker:
|
|
</P>
|
|
<PRE>
|
|
> generate_trees | put_trees -transform=solve | linearize
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<B>Exercise</B>. Measure the size of the corpus generated from
|
|
<CODE>SmartEng</CODE>, with and without type checker filtering.
|
|
</P>
|
|
<A NAME="toc84"></A>
|
|
<H2>Digression: dependent types in concrete syntax</H2>
|
|
<A NAME="toc85"></A>
|
|
<H3>Variables in function types</H3>
|
|
<P>
|
|
A dependent function type needs to introduce a variable for
|
|
its argument type, as in
|
|
</P>
|
|
<PRE>
|
|
switchOff : (k : Kind) -> Action k
|
|
</PRE>
|
|
<P>
|
|
Function types <I>without</I>
|
|
variables are actually a shorthand notation: writing
|
|
</P>
|
|
<PRE>
|
|
fun PredVP : NP -> VP -> S
|
|
</PRE>
|
|
<P>
|
|
is shorthand for
|
|
</P>
|
|
<PRE>
|
|
fun PredVP : (x : NP) -> (y : VP) -> S
|
|
</PRE>
|
|
<P>
|
|
or any other naming of the variables. Actually the use of variables
|
|
sometimes shortens the code, since they can share a type:
|
|
</P>
|
|
<PRE>
|
|
octuple : (x,y,z,u,v,w,s,t : Str) -> Str
|
|
</PRE>
|
|
<P>
|
|
If a bound variable is not used, it can here, as elsewhere in GF, be replaced by
|
|
a wildcard:
|
|
</P>
|
|
<PRE>
|
|
octuple : (_,_,_,_,_,_,_,_ : Str) -> Str
|
|
</PRE>
|
|
<P>
|
|
A good practice for functions with many arguments of the same type
|
|
is to indicate the number of arguments:
|
|
</P>
|
|
<PRE>
|
|
octuple : (x1,_,_,_,_,_,_,x8 : Str) -> Str
|
|
</PRE>
|
|
<P>
|
|
One can also use the variables to document what each argument is expected
|
|
to provide, as is done in inflection paradigms in the resource grammar.
|
|
</P>
|
|
<PRE>
|
|
mkV : (drink,drank,drunk : Str) -> V
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc86"></A>
|
|
<H3>Polymorphism in concrete syntax</H3>
|
|
<P>
|
|
The <B>functional fragment</B> of GF
|
|
terms and types comprises function types, applications, lambda
|
|
abstracts, constants, and variables. This fragment is similar in
|
|
abstract and concrete syntax. In particular,
|
|
dependent types are also available in concrete syntax.
|
|
We have not made use of them yet,
|
|
but we will now look at one example of how they
|
|
can be used.
|
|
</P>
|
|
<P>
|
|
Those readers who are familiar with functional programming languages
|
|
like ML and Haskell, may already have missed <B>polymorphic</B>
|
|
functions. For instance, Haskell programmers have access to
|
|
the functions
|
|
</P>
|
|
<PRE>
|
|
const :: a -> b -> a
|
|
const c _ = c
|
|
|
|
flip :: (a -> b -> c) -> b -> a -> c
|
|
flip f y x = f x y
|
|
</PRE>
|
|
<P>
|
|
which can be used for any given types <CODE>a</CODE>,<CODE>b</CODE>, and <CODE>c</CODE>.
|
|
</P>
|
|
<P>
|
|
The GF counterpart of polymorphic functions are <B>monomorphic</B>
|
|
functions with explicit <B>type variables</B>. Thus the above
|
|
definitions can be written
|
|
</P>
|
|
<PRE>
|
|
oper const :(a,b : Type) -> a -> b -> a =
|
|
\_,_,c,_ -> c ;
|
|
|
|
oper flip : (a,b,c : Type) -> (a -> b ->c) -> b -> a -> c =
|
|
\_,_,_,f,x,y -> f y x ;
|
|
</PRE>
|
|
<P>
|
|
When the operations are used, the type checker requires
|
|
them to be equipped with all their arguments; this may be a nuisance
|
|
for a Haskell or ML programmer.
|
|
</P>
|
|
<A NAME="toc87"></A>
|
|
<H2>Proof objects</H2>
|
|
<P>
|
|
Perhaps the most well-known idea in constructive type theory is
|
|
the <B>Curry-Howard isomorphism</B>, also known as the
|
|
<B>propositions as types principle</B>. Its earliest formulations
|
|
were attempts to give semantics to the logical systems of
|
|
propositional and predicate calculus. In this section, we will consider
|
|
a more elementary example, showing how the notion of proof is useful
|
|
outside mathematics, as well.
|
|
</P>
|
|
<P>
|
|
We first define the category of unary (also known as Peano-style)
|
|
natural numbers:
|
|
</P>
|
|
<PRE>
|
|
cat Nat ;
|
|
fun Zero : Nat ;
|
|
fun Succ : Nat -> Nat ;
|
|
</PRE>
|
|
<P>
|
|
The <B>successor function</B> <CODE>Succ</CODE> generates an infinite
|
|
sequence of natural numbers, beginning from <CODE>Zero</CODE>.
|
|
</P>
|
|
<P>
|
|
We then define what it means for a number <I>x</I> to be <I>less than</I>
|
|
a number <I>y</I>. Our definition is based on two axioms:
|
|
</P>
|
|
<UL>
|
|
<LI><CODE>Zero</CODE> is less than <CODE>Succ</CODE> <I>y</I> for any <I>y</I>.
|
|
<LI>If <I>x</I> is less than <I>y</I>, then <CODE>Succ</CODE> <I>x</I> is less than <CODE>Succ</CODE> <I>y</I>.
|
|
</UL>
|
|
|
|
<P>
|
|
The most straightforward way of expressing these axioms in type theory
|
|
is as typing judgements that introduce objects of a type <CODE>Less</CODE> <I>x y</I>:
|
|
</P>
|
|
<PRE>
|
|
cat Less Nat Nat ;
|
|
fun lessZ : (y : Nat) -> Less Zero (Succ y) ;
|
|
fun lessS : (x,y : Nat) -> Less x y -> Less (Succ x) (Succ y) ;
|
|
</PRE>
|
|
<P>
|
|
Objects formed by <CODE>lessZ</CODE> and <CODE>lessS</CODE> are
|
|
called <B>proof objects</B>: they establish the truth of certain
|
|
mathematical propositions.
|
|
For instance, the fact that 2 is less that
|
|
4 has the proof object
|
|
</P>
|
|
<PRE>
|
|
lessS (Succ Zero) (Succ (Succ (Succ Zero)))
|
|
(lessS Zero (Succ (Succ Zero)) (lessZ (Succ Zero)))
|
|
</PRE>
|
|
<P>
|
|
whose type is
|
|
</P>
|
|
<PRE>
|
|
Less (Succ (Succ Zero)) (Succ (Succ (Succ (Succ Zero))))
|
|
</PRE>
|
|
<P>
|
|
which is the formalization of the proposition that 2 is less than 4.
|
|
</P>
|
|
<P>
|
|
GF grammars can be used to provide a <B>semantic control</B> of
|
|
well-formedness of expressions. We have already seen examples of this:
|
|
the grammar of well-formed actions on household devices. By introducing proof objects
|
|
we have now added a very powerful technique of expressing semantic conditions.
|
|
</P>
|
|
<P>
|
|
A simple example of the use of proof objects is the definition of
|
|
well-formed <I>time spans</I>: a time span is expected to be from an earlier to
|
|
a later time:
|
|
</P>
|
|
<PRE>
|
|
from 3 to 8
|
|
</PRE>
|
|
<P>
|
|
is thus well-formed, whereas
|
|
</P>
|
|
<PRE>
|
|
from 8 to 3
|
|
</PRE>
|
|
<P>
|
|
is not. The following rules for spans impose this condition
|
|
by using the <CODE>Less</CODE> predicate:
|
|
</P>
|
|
<PRE>
|
|
cat Span ;
|
|
fun span : (m,n : Nat) -> Less m n -> Span ;
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<B>Exercise</B>. Write an abstract and concrete syntax with the
|
|
concepts of this section, and experiment with it in GF.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Define the notions of "even" and "odd" in terms
|
|
of proof objects. <B>Hint</B>. You need one function for proving
|
|
that 0 is even, and two other functions for propagating the
|
|
properties.
|
|
</P>
|
|
<A NAME="toc88"></A>
|
|
<H3>Proof-carrying documents</H3>
|
|
<P>
|
|
Another possible application of proof objects is <B>proof-carrying documents</B>:
|
|
to be semantically well-formed, the abstract syntax of a document must contain a proof
|
|
of some property, although the proof is not shown in the concrete document.
|
|
Think, for instance, of small documents describing flight connections:
|
|
</P>
|
|
<P>
|
|
<I>To fly from Gothenburg to Prague, first take LH3043 to Frankfurt, then OK0537 to Prague.</I>
|
|
</P>
|
|
<P>
|
|
The well-formedness of this text is partly expressible by dependent typing:
|
|
</P>
|
|
<PRE>
|
|
cat
|
|
City ;
|
|
Flight City City ;
|
|
fun
|
|
Gothenburg, Frankfurt, Prague : City ;
|
|
LH3043 : Flight Gothenburg Frankfurt ;
|
|
OK0537 : Flight Frankfurt Prague ;
|
|
</PRE>
|
|
<P>
|
|
This rules out texts saying <I>take OK0537 from Gothenburg to Prague</I>.
|
|
However, there is a
|
|
further condition saying that it must be possible to
|
|
change from LH3043 to OK0537 in Frankfurt.
|
|
This can be modelled as a proof object of a suitable type,
|
|
which is required by the constructor
|
|
that connects flights.
|
|
</P>
|
|
<PRE>
|
|
cat
|
|
IsPossible (x,y,z : City)(Flight x y)(Flight y z) ;
|
|
fun
|
|
Connect : (x,y,z : City) ->
|
|
(u : Flight x y) -> (v : Flight y z) ->
|
|
IsPossible x y z u v -> Flight x z ;
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc89"></A>
|
|
<H2>Restricted polymorphism</H2>
|
|
<P>
|
|
In the first version of the smart house grammar <CODE>Smart</CODE>,
|
|
all Actions were either of
|
|
</P>
|
|
<UL>
|
|
<LI><B>monomorphic</B>: defined for one Kind
|
|
<LI><B>polymorphic</B>: defined for all Kinds
|
|
</UL>
|
|
|
|
<P>
|
|
To make this scale up for new Kinds, we can refine this to
|
|
<B>restricted polymorphism</B>: defined for Kinds of a certain <B>class</B>
|
|
</P>
|
|
<P>
|
|
The notion of class can be expressed in abstract syntax
|
|
by using the Curry-Howard isomorphism as follows:
|
|
</P>
|
|
<UL>
|
|
<LI>a class is a <B>predicate</B> of Kinds - i.e. a type depending of Kinds
|
|
<LI>a Kind is in a class if there is a proof object of this type
|
|
</UL>
|
|
|
|
<P>
|
|
Here is an example with switching and dimming. The classes are called
|
|
<CODE>switchable</CODE> and <CODE>dimmable</CODE>.
|
|
</P>
|
|
<PRE>
|
|
cat
|
|
Switchable Kind ;
|
|
Dimmable Kind ;
|
|
fun
|
|
switchable_light : Switchable light ;
|
|
switchable_fan : Switchable fan ;
|
|
dimmable_light : Dimmable light ;
|
|
|
|
switchOn : (k : Kind) -> Switchable k -> Action k ;
|
|
dim : (k : Kind) -> Dimmable k -> Action k ;
|
|
</PRE>
|
|
<P>
|
|
One advantage of this formalization is that classes for new
|
|
actions can be added incrementally.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Write a new version of the <CODE>Smart</CODE> grammar with
|
|
classes, and test it in GF.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Add some actions, kinds, and classes to the grammar.
|
|
Try to port the grammar to a new language. You will probably find
|
|
out that restricted polymorphism works differently in different languages.
|
|
For instance, in Finnish not only doors but also TVs and radios
|
|
can be "opened", which means switching them on.
|
|
</P>
|
|
<A NAME="toc90"></A>
|
|
<H2>Variable bindings</H2>
|
|
<P>
|
|
Mathematical notation and programming languages have
|
|
expressions that <B>bind</B> variables. For instance,
|
|
a universally quantifier proposition
|
|
</P>
|
|
<PRE>
|
|
(All x)B(x)
|
|
</PRE>
|
|
<P>
|
|
consists of the <B>binding</B> <CODE>(All x)</CODE> of the variable <CODE>x</CODE>,
|
|
and the <B>body</B> <CODE>B(x)</CODE>, where the variable <CODE>x</CODE> can have
|
|
<B>bound occurrences</B>.
|
|
</P>
|
|
<P>
|
|
Variable bindings appear in informal mathematical language as well, for
|
|
instance,
|
|
</P>
|
|
<PRE>
|
|
for all x, x is equal to x
|
|
|
|
the function that for any numbers x and y returns the maximum of x+y
|
|
and x*y
|
|
|
|
Let x be a natural number. Assume that x is even. Then x + 3 is odd.
|
|
</PRE>
|
|
<P>
|
|
In type theory, variable-binding expression forms can be formalized
|
|
as functions that take functions as arguments. The universal
|
|
quantifier is defined
|
|
</P>
|
|
<PRE>
|
|
fun All : (Ind -> Prop) -> Prop
|
|
</PRE>
|
|
<P>
|
|
where <CODE>Ind</CODE> is the type of individuals and <CODE>Prop</CODE>,
|
|
the type of propositions. If we have, for instance, the equality predicate
|
|
</P>
|
|
<PRE>
|
|
fun Eq : Ind -> Ind -> Prop
|
|
</PRE>
|
|
<P>
|
|
we may form the tree
|
|
</P>
|
|
<PRE>
|
|
All (\x -> Eq x x)
|
|
</PRE>
|
|
<P>
|
|
which corresponds to the ordinary notation
|
|
</P>
|
|
<PRE>
|
|
(All x)(x = x).
|
|
</PRE>
|
|
<P>
|
|
An abstract syntax where trees have functions as arguments, as in
|
|
the two examples above, has turned out to be precisely the right
|
|
thing for the semantics and computer implementation of
|
|
variable-binding expressions. The advantage lies in the fact that
|
|
only one variable-binding expression form is needed, the lambda abstract
|
|
<CODE>\x -> b</CODE>, and all other bindings can be reduced to it.
|
|
This makes it easier to implement mathematical theories and reason
|
|
about them, since variable binding is tricky to implement and
|
|
to reason about. The idea of using functions as arguments of
|
|
syntactic constructors is known as <B>higher-order abstract syntax</B>.
|
|
</P>
|
|
<P>
|
|
The question now arises: how to define linearization rules
|
|
for variable-binding expressions?
|
|
Let us first consider universal quantification,
|
|
</P>
|
|
<PRE>
|
|
fun All : (Ind -> Prop) -> Prop
|
|
</PRE>
|
|
<P>
|
|
We write
|
|
</P>
|
|
<PRE>
|
|
lin All B = {s = "(" ++ "All" ++ B.$0 ++ ")" ++ B.s}
|
|
</PRE>
|
|
<P>
|
|
to obtain the form shown above.
|
|
This linearization rule brings in a new GF concept - the <CODE>$0</CODE>
|
|
field of <CODE>B</CODE> containing a bound variable symbol.
|
|
The general rule is that, if an argument type of a function is
|
|
itself a function type <CODE>A -> C</CODE>, the linearization type of
|
|
this argument is the linearization type of <CODE>C</CODE>
|
|
together with a new field <CODE>$0 : Str</CODE>. In the linearization rule
|
|
for <CODE>All</CODE>, the argument <CODE>B</CODE> thus has the linearization
|
|
type
|
|
</P>
|
|
<PRE>
|
|
{$0 : Str ; s : Str},
|
|
</PRE>
|
|
<P>
|
|
since the linearization type of <CODE>Prop</CODE> is
|
|
</P>
|
|
<PRE>
|
|
{s : Str}
|
|
</PRE>
|
|
<P>
|
|
In other words, the linearization of a function
|
|
consists of a linearization of the body together with a
|
|
field for a linearization of the bound variable.
|
|
Those familiar with type theory or lambda calculus
|
|
should notice that GF requires trees to be in
|
|
<B>eta-expanded</B> form in order to be linearizable:
|
|
any function of type
|
|
</P>
|
|
<PRE>
|
|
A -> B
|
|
</PRE>
|
|
<P>
|
|
always has a syntax tree of the form
|
|
</P>
|
|
<PRE>
|
|
\x -> b
|
|
</PRE>
|
|
<P>
|
|
where <CODE>b : B</CODE> under the assumption <CODE>x : A</CODE>.
|
|
It is in this form that an expression can be analysed
|
|
as having a bound variable and a body.
|
|
</P>
|
|
<P>
|
|
Given the linearization rule
|
|
</P>
|
|
<PRE>
|
|
lin Eq a b = {s = "(" ++ a.s ++ "=" ++ b.s ++ ")"}
|
|
</PRE>
|
|
<P>
|
|
the linearization of
|
|
</P>
|
|
<PRE>
|
|
\x -> Eq x x
|
|
</PRE>
|
|
<P>
|
|
is the record
|
|
</P>
|
|
<PRE>
|
|
{$0 = "x", s = ["( x = x )"]}
|
|
</PRE>
|
|
<P>
|
|
Thus we can compute the linearization of the formula,
|
|
</P>
|
|
<PRE>
|
|
All (\x -> Eq x x) --> {s = "[( All x ) ( x = x )]"}.
|
|
</PRE>
|
|
<P>
|
|
How did we get the <I>linearization</I> of the variable <CODE>x</CODE>
|
|
into the string <CODE>"x"</CODE>? GF grammars have no rules for
|
|
this: it is just hard-wired in GF that variable symbols are
|
|
linearized into the same strings that represent them in
|
|
the print-out of the abstract syntax.
|
|
</P>
|
|
<P>
|
|
To be able to <I>parse</I> variable symbols, however, GF needs to know what
|
|
to look for (instead of e.g. trying to parse <I>any</I>
|
|
string as a variable). What strings are parsed as variable symbols
|
|
is defined in the lexical analysis part of GF parsing
|
|
</P>
|
|
<PRE>
|
|
> p -cat=Prop -lexer=codevars "(All x)(x = x)"
|
|
All (\x -> Eq x x)
|
|
</PRE>
|
|
<P>
|
|
(see more details on lexers below). If several variables are bound in the
|
|
same argument, the labels are <CODE>$0, $1, $2</CODE>, etc.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Write an abstract syntax of the whole
|
|
<B>predicate calculus</B>, with the
|
|
<B>connectives</B> "and", "or", "implies", and "not", and the
|
|
<B>quantifiers</B> "exists" and "for all". Use higher-order functions
|
|
to guarantee that unbounded variables do not occur.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. Write a concrete syntax for your favourite
|
|
notation of predicate calculus. Use Latex as target language
|
|
if you want nice output. You can also try producing Haskell boolean
|
|
expressions. Use as many parenthesis as you need to
|
|
guarantee non-ambiguity.
|
|
</P>
|
|
<A NAME="toc91"></A>
|
|
<H2>Semantic definitions</H2>
|
|
<P>
|
|
We have seen that,
|
|
just like functional programming languages, GF has declarations
|
|
of functions, telling what the type of a function is.
|
|
But we have not yet shown how to <B>compute</B>
|
|
these functions: all we can do is provide them with arguments
|
|
and linearize the resulting terms.
|
|
Since our main interest is the well-formedness of expressions,
|
|
this has not yet bothered
|
|
us very much. As we will see, however, computation does play a role
|
|
even in the well-formedness of expressions when dependent types are
|
|
present.
|
|
</P>
|
|
<P>
|
|
GF has a form of judgement for <B>semantic definitions</B>,
|
|
recognized by the key word <CODE>def</CODE>. At its simplest, it is just
|
|
the definition of one constant, e.g.
|
|
</P>
|
|
<PRE>
|
|
def one = Succ Zero ;
|
|
</PRE>
|
|
<P>
|
|
We can also define a function with arguments,
|
|
</P>
|
|
<PRE>
|
|
def Neg A = Impl A Abs ;
|
|
</PRE>
|
|
<P>
|
|
which is still a special case of the most general notion of
|
|
definition, that of a group of <B>pattern equations</B>:
|
|
</P>
|
|
<PRE>
|
|
def
|
|
sum x Zero = x ;
|
|
sum x (Succ y) = Succ (Sum x y) ;
|
|
</PRE>
|
|
<P>
|
|
To compute a term is, as in functional programming languages,
|
|
simply to follow a chain of reductions until no definition
|
|
can be applied. For instance, we compute
|
|
</P>
|
|
<PRE>
|
|
Sum one one -->
|
|
Sum (Succ Zero) (Succ Zero) -->
|
|
Succ (sum (Succ Zero) Zero) -->
|
|
Succ (Succ Zero)
|
|
</PRE>
|
|
<P>
|
|
Computation in GF is performed with the <CODE>pt</CODE> command and the
|
|
<CODE>compute</CODE> transformation, e.g.
|
|
</P>
|
|
<PRE>
|
|
> p -tr "1 + 1" | pt -transform=compute -tr | l
|
|
sum one one
|
|
Succ (Succ Zero)
|
|
s(s(0))
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
The <CODE>def</CODE> definitions of a grammar induce a notion of
|
|
<B>definitional equality</B> among trees: two trees are
|
|
definitionally equal if they compute into the same tree.
|
|
Thus, trivially, all trees in a chain of computation
|
|
(such as the one above)
|
|
are definitionally equal to each other. So are the trees
|
|
</P>
|
|
<PRE>
|
|
sum Zero (Succ one)
|
|
Succ one
|
|
sum (sum Zero Zero) (sum (Succ Zero) one)
|
|
</PRE>
|
|
<P>
|
|
and infinitely many other trees.
|
|
</P>
|
|
<P>
|
|
A fact that has to be emphasized about <CODE>def</CODE> definitions is that
|
|
they are <I>not</I> performed as a first step of linearization.
|
|
We say that <B>linearization is intensional</B>, which means that
|
|
the definitional equality of two trees does not imply that
|
|
they have the same linearizations. For instance, each of the seven terms
|
|
shown above has a different linearizations in arithmetic notation:
|
|
</P>
|
|
<PRE>
|
|
1 + 1
|
|
s(0) + s(0)
|
|
s(s(0) + 0)
|
|
s(s(0))
|
|
0 + s(0)
|
|
s(1)
|
|
0 + 0 + s(0) + 1
|
|
</PRE>
|
|
<P>
|
|
This notion of intensionality is
|
|
no more exotic than the intensionality of any <B>pretty-printing</B>
|
|
function of a programming language (function that shows
|
|
the expressions of the language as strings). It is vital for
|
|
pretty-printing to be intensional in this sense - if we want,
|
|
for instance, to trace a chain of computation by pretty-printing each
|
|
intermediate step, what we want to see is a sequence of different
|
|
expression, which are definitionally equal.
|
|
</P>
|
|
<P>
|
|
What is more exotic is that GF has two ways of referring to the
|
|
abstract syntax objects. In the concrete syntax, the reference is intensional.
|
|
In the abstract syntax, the reference is extensional, since
|
|
<B>type checking is extensional</B>. The reason is that,
|
|
in the type theory with dependent types, types may depend on terms.
|
|
Two types depending on terms that are definitionally equal are
|
|
equal types. For instance,
|
|
</P>
|
|
<PRE>
|
|
Proof (Odd one)
|
|
Proof (Odd (Succ Zero))
|
|
</PRE>
|
|
<P>
|
|
are equal types. Hence, any tree that type checks as a proof that
|
|
1 is odd also type checks as a proof that the successor of 0 is odd.
|
|
(Recall, in this connection, that the
|
|
arguments a category depends on never play any role
|
|
in the linearization of trees of that category,
|
|
nor in the definition of the linearization type.)
|
|
</P>
|
|
<P>
|
|
In addition to computation, definitions impose a
|
|
<B>paraphrase</B> relation on expressions:
|
|
two strings are paraphrases if they
|
|
are linearizations of trees that are
|
|
definitionally equal.
|
|
Paraphrases are sometimes interesting for
|
|
translation: the <B>direct translation</B>
|
|
of a string, which is the linearization of the same tree
|
|
in the targer language, may be inadequate because it is e.g.
|
|
unidiomatic or ambiguous. In such a case,
|
|
the translation algorithm may be made to consider
|
|
translation by a paraphrase.
|
|
</P>
|
|
<P>
|
|
To stress express the distinction between
|
|
<B>constructors</B> (=<B>canonical</B> functions)
|
|
and other functions, GF has a judgement form
|
|
<CODE>data</CODE> to tell that certain functions are canonical, e.g.
|
|
</P>
|
|
<PRE>
|
|
data Nat = Succ | Zero ;
|
|
</PRE>
|
|
<P>
|
|
Unlike in Haskell, but similarly to ALF (where constructor functions
|
|
are marked with a flag <CODE>C</CODE>),
|
|
new constructors can be added to
|
|
a type with new <CODE>data</CODE> judgements. The type signatures of constructors
|
|
are given separately, in ordinary <CODE>fun</CODE> judgements.
|
|
One can also write directly
|
|
</P>
|
|
<PRE>
|
|
data Succ : Nat -> Nat ;
|
|
</PRE>
|
|
<P>
|
|
which is equivalent to the two judgements
|
|
</P>
|
|
<PRE>
|
|
fun Succ : Nat -> Nat ;
|
|
data Nat = Succ ;
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<B>Exercise</B>. Implement an interpreter of a small functional programming
|
|
language with natural numbers, lists, pairs, lambdas, etc. Use higher-order
|
|
abstract syntax with semantic definitions. As target language, use
|
|
your favourite programming language.
|
|
</P>
|
|
<P>
|
|
<B>Exercise</B>. To make your interpreted language look nice, use
|
|
<B>precedences</B> instead of putting parentheses everywhere.
|
|
You can use the <A HREF="../../lib/prelude/Precedence.gf">precedence library</A>
|
|
of GF to facilitate this.
|
|
</P>
|
|
<A NAME="toc92"></A>
|
|
<H1>Practical issues</H1>
|
|
<A NAME="toc93"></A>
|
|
<H2>Lexers and unlexers</H2>
|
|
<P>
|
|
Lexers and unlexers can be chosen from
|
|
a list of predefined ones, using the flags<CODE>-lexer</CODE> and `` -unlexer`` either
|
|
in the grammar file or on the GF command line. Here are some often-used lexers
|
|
and unlexers:
|
|
</P>
|
|
<PRE>
|
|
The default is words.
|
|
-lexer=words tokens are separated by spaces or newlines
|
|
-lexer=literals like words, but GF integer and string literals recognized
|
|
-lexer=vars like words, but "x","x_...","$...$" as vars, "?..." as meta
|
|
-lexer=chars each character is a token
|
|
-lexer=code use Haskell's lex
|
|
-lexer=codevars like code, but treat unknown words as variables, ?? as meta
|
|
-lexer=text with conventions on punctuation and capital letters
|
|
-lexer=codelit like code, but treat unknown words as string literals
|
|
-lexer=textlit like text, but treat unknown words as string literals
|
|
|
|
The default is unwords.
|
|
-unlexer=unwords space-separated token list (like unwords)
|
|
-unlexer=text format as text: punctuation, capitals, paragraph <p>
|
|
-unlexer=code format as code (spacing, indentation)
|
|
-unlexer=textlit like text, but remove string literal quotes
|
|
-unlexer=codelit like code, but remove string literal quotes
|
|
-unlexer=concat remove all spaces
|
|
</PRE>
|
|
<P>
|
|
More options can be found by <CODE>help -lexer</CODE> and <CODE>help -unlexer</CODE>:
|
|
</P>
|
|
<A NAME="toc94"></A>
|
|
<H2>Speech input and output</H2>
|
|
<P>
|
|
The <CODE>speak_aloud = sa</CODE> command sends a string to the speech
|
|
synthesizer
|
|
<A HREF="http://www.speech.cs.cmu.edu/flite/doc/">Flite</A>.
|
|
It is typically used via a pipe:
|
|
</P>
|
|
<PRE>
|
|
generate_random | linearize | speak_aloud
|
|
</PRE>
|
|
<P>
|
|
The result is only satisfactory for English.
|
|
</P>
|
|
<P>
|
|
The <CODE>speech_input = si</CODE> command receives a string from a
|
|
speech recognizer that requires the installation of
|
|
<A HREF="http://mi.eng.cam.ac.uk/~sjy/software.htm">ATK</A>.
|
|
It is typically used to pipe input to a parser:
|
|
</P>
|
|
<PRE>
|
|
speech_input -tr | parse
|
|
</PRE>
|
|
<P>
|
|
The method words only for grammars of English.
|
|
</P>
|
|
<P>
|
|
Both Flite and ATK are freely available through the links
|
|
above, but they are not distributed together with GF.
|
|
</P>
|
|
<A NAME="toc95"></A>
|
|
<H2>Multilingual syntax editor</H2>
|
|
<P>
|
|
The
|
|
<A HREF="http://www.cs.chalmers.se/~aarne/GF2.0/doc/javaGUImanual/javaGUImanual.htm">Editor User Manual</A>
|
|
describes the use of the editor, which works for any multilingual GF grammar.
|
|
</P>
|
|
<P>
|
|
Here is a snapshot of the editor:
|
|
</P>
|
|
<P>
|
|
<center>
|
|
</P>
|
|
<P>
|
|
<IMG ALIGN="middle" SRC="../quick-editor.png" BORDER="0" ALT="">
|
|
</P>
|
|
<P>
|
|
</center>
|
|
</P>
|
|
<P>
|
|
The grammars of the snapshot are from the
|
|
<A HREF="http://www.cs.chalmers.se/~aarne/GF/examples/letter">Letter grammar package</A>.
|
|
</P>
|
|
<A NAME="toc96"></A>
|
|
<H2>Communicating with GF</H2>
|
|
<P>
|
|
Other processes can communicate with the GF command interpreter,
|
|
and also with the GF syntax editor. Useful flags when invoking GF are
|
|
</P>
|
|
<UL>
|
|
<LI><CODE>-batch</CODE> suppresses the promps and structures the communication with XML tags.
|
|
<LI><CODE>-s</CODE> suppresses non-output non-error messages and XML tags.
|
|
<LI><CODE>-nocpu</CODE> suppresses CPU time indication.
|
|
</UL>
|
|
|
|
<P>
|
|
Thus the most silent way to invoke GF is
|
|
</P>
|
|
<PRE>
|
|
gf -batch -s -nocpu
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc97"></A>
|
|
<H1>Embedded grammars in Haskell and Java</H1>
|
|
<P>
|
|
GF grammars can be used as parts of programs written in the
|
|
following languages. We will go through a skeleton application in
|
|
Haskell, while the next chapter will show how to build an
|
|
application in Java.
|
|
</P>
|
|
<P>
|
|
We will show how to build a minimal resource grammar
|
|
application whose architecture scales up to much
|
|
larger applications. The application is run from the
|
|
shell by the command
|
|
</P>
|
|
<PRE>
|
|
math
|
|
</PRE>
|
|
<P>
|
|
whereafter it reads user input in English and French.
|
|
To each input line, it answers by the truth value of
|
|
the sentence.
|
|
</P>
|
|
<PRE>
|
|
./math
|
|
zéro est pair
|
|
True
|
|
zero is odd
|
|
False
|
|
zero is even and zero is odd
|
|
False
|
|
</PRE>
|
|
<P>
|
|
The source of the application consists of the following
|
|
files:
|
|
</P>
|
|
<PRE>
|
|
LexEng.gf -- English instance of Lex
|
|
LexFre.gf -- French instance of Lex
|
|
Lex.gf -- lexicon interface
|
|
Makefile -- a makefile
|
|
MathEng.gf -- English instantiation of MathI
|
|
MathFre.gf -- French instantiation of MathI
|
|
Math.gf -- abstract syntax
|
|
MathI.gf -- concrete syntax functor for Math
|
|
Run.hs -- Haskell Main module
|
|
</PRE>
|
|
<P>
|
|
The system was built in 22 steps explained below.
|
|
</P>
|
|
<A NAME="toc98"></A>
|
|
<H2>Writing GF grammars</H2>
|
|
<A NAME="toc99"></A>
|
|
<H3>Creating the first grammar</H3>
|
|
<P>
|
|
1. Write <CODE>Math.gf</CODE>, which defines what you want to say.
|
|
</P>
|
|
<PRE>
|
|
abstract Math = {
|
|
cat Prop ; Elem ;
|
|
fun
|
|
And : Prop -> Prop -> Prop ;
|
|
Even : Elem -> Prop ;
|
|
Zero : Elem ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
2. Write <CODE>Lex.gf</CODE>, which defines which language-dependent
|
|
parts are needed in the concrete syntax. These are mostly
|
|
words (lexicon), but can in fact be any operations. The definitions
|
|
only use resource abstract syntax, which is opened.
|
|
</P>
|
|
<PRE>
|
|
interface Lex = open Syntax in {
|
|
oper
|
|
even_A : A ;
|
|
zero_PN : PN ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
3. Write <CODE>LexEng.gf</CODE>, the English implementation of <CODE>Lex.gf</CODE>
|
|
This module uses English resource libraries.
|
|
</P>
|
|
<PRE>
|
|
instance LexEng of Lex = open GrammarEng, ParadigmsEng in {
|
|
oper
|
|
even_A = regA "even" ;
|
|
zero_PN = regPN "zero" ;
|
|
|
|
}
|
|
</PRE>
|
|
<P>
|
|
4. Write <CODE>MathI.gf</CODE>, a language-independent concrete syntax of
|
|
<CODE>Math.gf</CODE>. It opens interfaces.
|
|
which makes it an incomplete module, aka. parametrized module, aka.
|
|
functor.
|
|
</P>
|
|
<PRE>
|
|
incomplete concrete MathI of Math =
|
|
|
|
open Syntax, Lex in {
|
|
|
|
flags startcat = Prop ;
|
|
|
|
lincat
|
|
Prop = S ;
|
|
Elem = NP ;
|
|
lin
|
|
And x y = mkS and_Conj x y ;
|
|
Even x = mkS (mkCl x even_A) ;
|
|
Zero = mkNP zero_PN ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
5. Write <CODE>MathEng.gf</CODE>, which is just an instatiation of <CODE>MathI.gf</CODE>,
|
|
replacing the interfaces by their English instances. This is the module
|
|
that will be used as a top module in GF, so it contains a path to
|
|
the libraries.
|
|
</P>
|
|
<PRE>
|
|
instance LexEng of Lex = open SyntaxEng, ParadigmsEng in {
|
|
oper
|
|
even_A = mkA "even" ;
|
|
zero_PN = mkPN "zero" ;
|
|
}
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc100"></A>
|
|
<H3>Testing</H3>
|
|
<P>
|
|
6. Test the grammar in GF by random generation and parsing.
|
|
</P>
|
|
<PRE>
|
|
$ gf
|
|
> i MathEng.gf
|
|
> gr -tr | l -tr | p
|
|
And (Even Zero) (Even Zero)
|
|
zero is evenand zero is even
|
|
And (Even Zero) (Even Zero)
|
|
</PRE>
|
|
<P>
|
|
When importing the grammar, you will fail if you haven't
|
|
</P>
|
|
<UL>
|
|
<LI>correctly defined your <CODE>GF_LIB_PATH</CODE> as <CODE>GF/lib</CODE>
|
|
<LI>installed the resource package or
|
|
compiled the resource from source by <CODE>make</CODE> in <CODE>GF/lib/resource-1.0</CODE>
|
|
</UL>
|
|
|
|
<A NAME="toc101"></A>
|
|
<H3>Adding a new language</H3>
|
|
<P>
|
|
7. Now it is time to add a new language. Write a French lexicon <CODE>LexFre.gf</CODE>:
|
|
</P>
|
|
<PRE>
|
|
instance LexFre of Lex = open SyntaxFre, ParadigmsFre in {
|
|
oper
|
|
even_A = mkA "pair" ;
|
|
zero_PN = mkPN "zéro" ;
|
|
}
|
|
</PRE>
|
|
<P>
|
|
8. You also need a French concrete syntax, <CODE>MathFre.gf</CODE>:
|
|
</P>
|
|
<PRE>
|
|
--# -path=.:present:prelude
|
|
|
|
concrete MathFre of Math = MathI with
|
|
(Syntax = SyntaxFre),
|
|
(Lex = LexFre) ;
|
|
</PRE>
|
|
<P>
|
|
9. This time, you can test multilingual generation:
|
|
</P>
|
|
<PRE>
|
|
> i MathFre.gf
|
|
> gr | tb
|
|
Even Zero
|
|
zéro est pair
|
|
zero is even
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc102"></A>
|
|
<H3>Extending the language</H3>
|
|
<P>
|
|
10. You want to add a predicate saying that a number is odd.
|
|
It is first added to <CODE>Math.gf</CODE>:
|
|
</P>
|
|
<PRE>
|
|
fun Odd : Elem -> Prop ;
|
|
</PRE>
|
|
<P>
|
|
11. You need a new word in <CODE>Lex.gf</CODE>.
|
|
</P>
|
|
<PRE>
|
|
oper odd_A : A ;
|
|
</PRE>
|
|
<P>
|
|
12. Then you can give a language-independent concrete syntax in
|
|
<CODE>MathI.gf</CODE>:
|
|
</P>
|
|
<PRE>
|
|
lin Odd x = mkS (mkCl x odd_A) ;
|
|
</PRE>
|
|
<P>
|
|
13. The new word is implemented in <CODE>LexEng.gf</CODE>.
|
|
</P>
|
|
<PRE>
|
|
oper odd_A = mkA "odd" ;
|
|
</PRE>
|
|
<P>
|
|
14. The new word is implemented in <CODE>LexFre.gf</CODE>.
|
|
</P>
|
|
<PRE>
|
|
oper odd_A = mkA "impair" ;
|
|
</PRE>
|
|
<P>
|
|
15. Now you can test with the extended lexicon. First empty
|
|
the environment to get rid of the old abstract syntax, then
|
|
import the new versions of the grammars.
|
|
</P>
|
|
<PRE>
|
|
> e
|
|
> i MathEng.gf
|
|
> i MathFre.gf
|
|
> gr | tb
|
|
And (Odd Zero) (Even Zero)
|
|
zéro est impair et zéro est pair
|
|
zero is odd and zero is even
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc103"></A>
|
|
<H2>Building a user program</H2>
|
|
<A NAME="toc104"></A>
|
|
<H3>Producing a compiled grammar package</H3>
|
|
<P>
|
|
16. Your grammar is going to be used by persons wh<CODE>MathEng.gf</CODE>o do not need
|
|
to compile it again. They may not have access to the resource library,
|
|
either. Therefore it is advisable to produce a multilingual grammar
|
|
package in a single file. We call this package <CODE>math.gfcm</CODE> and
|
|
produce it, when we have <CODE>MathEng.gf</CODE> and
|
|
<CODE>MathEng.gf</CODE> in the GF state, by the command
|
|
</P>
|
|
<PRE>
|
|
> pm | wf math.gfcm
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc105"></A>
|
|
<H3>Writing the Haskell application</H3>
|
|
<P>
|
|
17. Write the Haskell main file <CODE>Run.hs</CODE>. It uses the <CODE>EmbeddedAPI</CODE>
|
|
module defining some basic functionalities such as parsing.
|
|
The answer is produced by an interpreter of trees returned by the parser.
|
|
</P>
|
|
<PRE>
|
|
module Main where
|
|
|
|
import GSyntax
|
|
import GF.Embed.EmbedAPI
|
|
|
|
main :: IO ()
|
|
main = do
|
|
gr <- file2grammar "math.gfcm"
|
|
loop gr
|
|
|
|
loop :: MultiGrammar -> IO ()
|
|
loop gr = do
|
|
s <- getLine
|
|
interpret gr s
|
|
loop gr
|
|
|
|
interpret :: MultiGrammar -> String -> IO ()
|
|
interpret gr s = do
|
|
let tss = parseAll gr "Prop" s
|
|
case (concat tss) of
|
|
[] -> putStrLn "no parse"
|
|
t:_ -> print $ answer $ fg t
|
|
|
|
answer :: GProp -> Bool
|
|
answer p = case p of
|
|
(GOdd x1) -> odd (value x1)
|
|
(GEven x1) -> even (value x1)
|
|
(GAnd x1 x2) -> answer x1 && answer x2
|
|
|
|
value :: GElem -> Int
|
|
value e = case e of
|
|
GZero -> 0
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
18. The syntax trees manipulated by the interpreter are not raw
|
|
GF trees, but objects of the Haskell datatype <CODE>GProp</CODE>.
|
|
From any GF grammar, a file <CODE>GFSyntax.hs</CODE> with
|
|
datatypes corresponding to its abstract
|
|
syntax can be produced by the command
|
|
</P>
|
|
<PRE>
|
|
> pg -printer=haskell | wf GSyntax.hs
|
|
</PRE>
|
|
<P>
|
|
The module also defines the overloaded functions
|
|
<CODE>gf</CODE> and <CODE>fg</CODE> for translating from these types to
|
|
raw trees and back.
|
|
</P>
|
|
<A NAME="toc106"></A>
|
|
<H3>Compiling the Haskell grammar</H3>
|
|
<P>
|
|
19. Before compiling <CODE>Run.hs</CODE>, you must check that the
|
|
embedded GF modules are found. The easiest way to do this
|
|
is by two symbolic links to your GF source directories:
|
|
</P>
|
|
<PRE>
|
|
$ ln -s /home/aarne/GF/src/GF
|
|
$ ln -s /home/aarne/GF/src/Transfer/
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
20. Now you can run the GHC Haskell compiler to produce the program.
|
|
</P>
|
|
<PRE>
|
|
$ ghc --make -o math Run.hs
|
|
</PRE>
|
|
<P>
|
|
The program can be tested with the command <CODE>./math</CODE>.
|
|
</P>
|
|
<A NAME="toc107"></A>
|
|
<H3>Building a distribution</H3>
|
|
<P>
|
|
21. For a stand-alone binary-only distribution, only
|
|
the two files <CODE>math</CODE> and <CODE>math.gfcm</CODE> are needed.
|
|
For a source distribution, the files mentioned in
|
|
the beginning of this documents are needed.
|
|
</P>
|
|
<A NAME="toc108"></A>
|
|
<H3>Using a Makefile</H3>
|
|
<P>
|
|
22. As a part of the source distribution, a <CODE>Makefile</CODE> is
|
|
essential. The <CODE>Makefile</CODE> is also useful when developing the
|
|
application. It should always be possible to build an executable
|
|
from source by typing <CODE>make</CODE>. Here is a minimal such <CODE>Makefile</CODE>:
|
|
</P>
|
|
<PRE>
|
|
all:
|
|
echo "pm | wf math.gfcm" | gf MathEng.gf MathFre.gf
|
|
echo "pg -printer=haskell | wf GSyntax.hs" | gf math.gfcm
|
|
ghc --make -o math Run.hs
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc109"></A>
|
|
<H1>Embedded grammars in Java</H1>
|
|
<P>
|
|
Forthcoming; at the moment, the document
|
|
</P>
|
|
<P>
|
|
<A HREF="http://www.cs.chalmers.se/~bringert/gf/gf-java.html"><CODE>http://www.cs.chalmers.se/~bringert/gf/gf-java.html</CODE></A>
|
|
</P>
|
|
<P>
|
|
by Björn Bringert gives more information on Java.
|
|
</P>
|
|
<A NAME="toc110"></A>
|
|
<H1>Further reading</H1>
|
|
<P>
|
|
Syntax Editor User Manual:
|
|
</P>
|
|
<P>
|
|
<A HREF="http://www.cs.chalmers.se/~aarne/GF2.0/doc/javaGUImanual/javaGUImanual.htm"><CODE>http://www.cs.chalmers.se/~aarne/GF2.0/doc/javaGUImanual/javaGUImanual.htm</CODE></A>
|
|
</P>
|
|
<P>
|
|
Resource Grammar Synopsis (on using resource grammars):
|
|
</P>
|
|
<P>
|
|
<A HREF="../../lib/resource-1.0/synopsis.html"><CODE>http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/synopsis.html</CODE></A>
|
|
</P>
|
|
<P>
|
|
Resource Grammar HOWTO (on writing resource grammars):
|
|
</P>
|
|
<P>
|
|
<A HREF="../../lib/resource-1.0/doc/Resource-HOWTO.html"><CODE>http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/synopsis.html</CODE></A>
|
|
</P>
|
|
<P>
|
|
GF Homepage:
|
|
</P>
|
|
<P>
|
|
<A HREF="../.."><CODE>http://www.cs.chalmers.se/~aarne/GF/doc</CODE></A>
|
|
</P>
|
|
|
|
<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) -->
|
|
<!-- cmdline: txt2tags -thtml -\-toc gf-tutorial2.txt -->
|
|
</BODY></HTML>
|