forked from GitHub/gf-core
progress in tutorial
This commit is contained in:
@@ -1,5 +1,5 @@
|
||||
abstract Gatherer = Paleolithic, Fish, Mushrooms ** {
|
||||
fun
|
||||
UseFish : Fish -> CN ;
|
||||
UseMushroom : Mushroom -> CN ;
|
||||
FishCN : Fish -> CN ;
|
||||
MushroomCN : Mushroom -> CN ;
|
||||
}
|
||||
@@ -7,7 +7,7 @@
|
||||
<P ALIGN="center"><CENTER><H1>Grammatical Framework Tutorial</H1>
|
||||
<FONT SIZE="4">
|
||||
<I>Author: Aarne Ranta <aarne (at) cs.chalmers.se></I><BR>
|
||||
Last update: Fri Dec 16 22:10:53 2005
|
||||
Last update: Sat Dec 17 13:32:10 2005
|
||||
</FONT></CENTER>
|
||||
|
||||
<P></P>
|
||||
@@ -45,43 +45,55 @@ Last update: Fri Dec 16 22:10:53 2005
|
||||
<LI><A HREF="#toc22">An Italian concrete syntax</A>
|
||||
<LI><A HREF="#toc23">Using a multilingual grammar</A>
|
||||
<LI><A HREF="#toc24">Translation quiz</A>
|
||||
<LI><A HREF="#toc25">The multilingual shell state</A>
|
||||
</UL>
|
||||
<LI><A HREF="#toc26">Grammar architecture</A>
|
||||
<LI><A HREF="#toc25">Grammar architecture</A>
|
||||
<UL>
|
||||
<LI><A HREF="#toc27">Extending a grammar</A>
|
||||
<LI><A HREF="#toc28">Multiple inheritance</A>
|
||||
<LI><A HREF="#toc29">Visualizing module structure</A>
|
||||
<LI><A HREF="#toc30">The module structure of ``GathererEng``</A>
|
||||
<LI><A HREF="#toc26">Extending a grammar</A>
|
||||
<LI><A HREF="#toc27">Multiple inheritance</A>
|
||||
<LI><A HREF="#toc28">Visualizing module structure</A>
|
||||
</UL>
|
||||
<LI><A HREF="#toc31">Resource modules</A>
|
||||
<LI><A HREF="#toc29">System commands</A>
|
||||
<LI><A HREF="#toc30">Resource modules</A>
|
||||
<UL>
|
||||
<LI><A HREF="#toc32">Parameters and tables</A>
|
||||
<LI><A HREF="#toc33">Inflection tables, paradigms, and ``oper`` definitions</A>
|
||||
<LI><A HREF="#toc34">The ``resource`` module type</A>
|
||||
<LI><A HREF="#toc35">Opening a ``resource``</A>
|
||||
<LI><A HREF="#toc36">Worst-case macros and data abstraction</A>
|
||||
<LI><A HREF="#toc37">A system of paradigms using ``Prelude`` operations</A>
|
||||
<LI><A HREF="#toc38">An intelligent noun paradigm using ``case`` expressions</A>
|
||||
<LI><A HREF="#toc39">Pattern matching</A>
|
||||
<LI><A HREF="#toc40">Morphological analysis and morphology quiz</A>
|
||||
<LI><A HREF="#toc41">Parametric vs. inherent features, agreement</A>
|
||||
<LI><A HREF="#toc42">English concrete syntax with parameters</A>
|
||||
<LI><A HREF="#toc43">Hierarchic parameter types</A>
|
||||
<LI><A HREF="#toc44">Discontinuous constituents</A>
|
||||
<LI><A HREF="#toc31">The golden rule of functional programming</A>
|
||||
<LI><A HREF="#toc32">Operation definitions</A>
|
||||
<LI><A HREF="#toc33">The ``resource`` module type</A>
|
||||
<LI><A HREF="#toc34">Opening a ``resource``</A>
|
||||
<LI><A HREF="#toc35">Division of labour</A>
|
||||
</UL>
|
||||
<LI><A HREF="#toc45">Topics still to be written</A>
|
||||
<LI><A HREF="#toc36">Morphology</A>
|
||||
<UL>
|
||||
<LI><A HREF="#toc46">Free variation</A>
|
||||
<LI><A HREF="#toc47">Record extension, tuples</A>
|
||||
<LI><A HREF="#toc48">Predefined types and operations</A>
|
||||
<LI><A HREF="#toc49">Lexers and unlexers</A>
|
||||
<LI><A HREF="#toc50">Grammars of formal languages</A>
|
||||
<LI><A HREF="#toc51">Resource grammars and their reuse</A>
|
||||
<LI><A HREF="#toc52">Embedded grammars in Haskell, Java, and Prolog</A>
|
||||
<LI><A HREF="#toc53">Dependent types, variable bindings, semantic definitions</A>
|
||||
<LI><A HREF="#toc54">Transfer modules</A>
|
||||
<LI><A HREF="#toc55">Alternative input and output grammar formats</A>
|
||||
<LI><A HREF="#toc37">Parameters and tables</A>
|
||||
<LI><A HREF="#toc38">Inflection tables, paradigms, and ``oper`` definitions</A>
|
||||
<LI><A HREF="#toc39">Worst-case macros and data abstraction</A>
|
||||
<LI><A HREF="#toc40">A system of paradigms using ``Prelude`` operations</A>
|
||||
<LI><A HREF="#toc41">An intelligent noun paradigm using ``case`` expressions</A>
|
||||
<LI><A HREF="#toc42">Pattern matching</A>
|
||||
<LI><A HREF="#toc43">Morphological ``resource`` modules</A>
|
||||
<LI><A HREF="#toc44">Testing ``resource`` modules</A>
|
||||
</UL>
|
||||
<LI><A HREF="#toc45">Using morphology in concrete syntax</A>
|
||||
<UL>
|
||||
<LI><A HREF="#toc46">Parametric vs. inherent features, agreement</A>
|
||||
<LI><A HREF="#toc47">English concrete syntax with parameters</A>
|
||||
<LI><A HREF="#toc48">Hierarchic parameter types</A>
|
||||
<LI><A HREF="#toc49">Morphological analysis and morphology quiz</A>
|
||||
<LI><A HREF="#toc50">Discontinuous constituents</A>
|
||||
</UL>
|
||||
<LI><A HREF="#toc51">Topics still to be written</A>
|
||||
<UL>
|
||||
<LI><A HREF="#toc52">Free variation</A>
|
||||
<LI><A HREF="#toc53">Record extension, tuples</A>
|
||||
<LI><A HREF="#toc54">Predefined types and operations</A>
|
||||
<LI><A HREF="#toc55">Lexers and unlexers</A>
|
||||
<LI><A HREF="#toc56">Grammars of formal languages</A>
|
||||
<LI><A HREF="#toc57">Resource grammars and their reuse</A>
|
||||
<LI><A HREF="#toc58">Interfaces, instances, and functors</A>
|
||||
<LI><A HREF="#toc59">Speech input and output</A>
|
||||
<LI><A HREF="#toc60">Embedded grammars in Haskell, Java, and Prolog</A>
|
||||
<LI><A HREF="#toc61">Dependent types, variable bindings, semantic definitions</A>
|
||||
<LI><A HREF="#toc62">Transfer modules</A>
|
||||
<LI><A HREF="#toc63">Alternative input and output grammar formats</A>
|
||||
</UL>
|
||||
</UL>
|
||||
|
||||
@@ -661,7 +673,7 @@ in subsequent <CODE>fun</CODE> judgements.
|
||||
<P>
|
||||
Each category introduced in <CODE>Paleolithic.gf</CODE> is
|
||||
given a <CODE>lincat</CODE> rule, and each
|
||||
function is given a <CODE>fun</CODE> rule. Similar shorthands
|
||||
function is given a <CODE>lin</CODE> rule. Similar shorthands
|
||||
apply as in <CODE>abstract</CODE> modules.
|
||||
</P>
|
||||
<PRE>
|
||||
@@ -718,7 +730,7 @@ depends on - in this case, <CODE>Paleolithic.gf</CODE>.
|
||||
For each file that is compiled, a <CODE>.gfc</CODE> file
|
||||
is generated. The GFC format (="GF Canonical") is the
|
||||
"machine code" of GF, which is faster to process than
|
||||
GF source files. When reading a module, GF knows whether
|
||||
GF source files. When reading a module, GF decides whether
|
||||
to use an existing <CODE>.gfc</CODE> file or to generate
|
||||
a new one, by looking at modification times.
|
||||
</P>
|
||||
@@ -771,7 +783,7 @@ multilingual grammar.
|
||||
<A NAME="toc23"></A>
|
||||
<H3>Using a multilingual grammar</H3>
|
||||
<P>
|
||||
Import without first emptying
|
||||
Import the two grammars in the same GF session.
|
||||
</P>
|
||||
<PRE>
|
||||
> i PaleolithicEng.gf
|
||||
@@ -794,13 +806,25 @@ Translate by using a pipe:
|
||||
> p -lang=PaleolithicEng "the boy eats the snake" | l -lang=PaleolithicIta
|
||||
il ragazzo mangia il serpente
|
||||
</PRE>
|
||||
<P>
|
||||
The <CODE>lang</CODE> flag tells GF which concrete syntax to use in parsing and
|
||||
linearization. By default, the flag is set to the last-imported grammar.
|
||||
To see what grammars are in scope and which is the main one, use the command
|
||||
<CODE>print_options = po</CODE>:
|
||||
</P>
|
||||
<PRE>
|
||||
> print_options
|
||||
main abstract : Paleolithic
|
||||
main concrete : PaleolithicIta
|
||||
actual concretes : PaleolithicIta PaleolithicEng
|
||||
</PRE>
|
||||
<P></P>
|
||||
<A NAME="toc24"></A>
|
||||
<H3>Translation quiz</H3>
|
||||
<P>
|
||||
This is a simple language exercise that can be automatically
|
||||
generated from a multilingual grammar. The system generates a set of
|
||||
random sentence, displays them in one language, and checks the user's
|
||||
random sentences, displays them in one language, and checks the user's
|
||||
answer given in another language. The command <CODE>translation_quiz = tq</CODE>
|
||||
makes this in a subshell of GF.
|
||||
</P>
|
||||
@@ -827,38 +851,18 @@ file for later use, by the command <CODE>translation_list = tl</CODE>
|
||||
> translation_list -number=25 PaleolithicEng PaleolithicIta
|
||||
</PRE>
|
||||
<P>
|
||||
The number flag gives the number of sentences generated.
|
||||
The <CODE>number</CODE> flag gives the number of sentences generated.
|
||||
</P>
|
||||
<A NAME="toc25"></A>
|
||||
<H3>The multilingual shell state</H3>
|
||||
<P>
|
||||
A GF shell is at any time in a state, which
|
||||
contains a multilingual grammar. One of the concrete
|
||||
syntaxes is the "main" one, which means that parsing and linearization
|
||||
are performed by using it. By default, the main concrete syntax is the
|
||||
last-imported one. As we saw on previous slide, the <CODE>lang</CODE> flag
|
||||
can be used to change the linearization and parsing grammar.
|
||||
</P>
|
||||
<P>
|
||||
To see what the multilingual grammar is (as well as some other
|
||||
things), you can use the command
|
||||
<CODE>print_options = po</CODE>:
|
||||
</P>
|
||||
<PRE>
|
||||
> print_options
|
||||
main abstract : Paleolithic
|
||||
main concrete : PaleolithicIta
|
||||
all concretes : PaleolithicIta PaleolithicEng
|
||||
</PRE>
|
||||
<P></P>
|
||||
<A NAME="toc26"></A>
|
||||
<H2>Grammar architecture</H2>
|
||||
<A NAME="toc27"></A>
|
||||
<A NAME="toc26"></A>
|
||||
<H3>Extending a grammar</H3>
|
||||
<P>
|
||||
The module system of GF makes it possible to <B>extend</B> a
|
||||
grammar in different ways. The syntax of extension is
|
||||
shown by the following example.
|
||||
shown by the following example. This is how language
|
||||
was extended when civilization advanced from the
|
||||
paleolithic to the neolithic age:
|
||||
</P>
|
||||
<PRE>
|
||||
abstract Neolithic = Paleolithic ** {
|
||||
@@ -883,11 +887,12 @@ be built for concrete syntaxes:
|
||||
The effect of extension is that all of the contents of the extended
|
||||
and extending module are put together.
|
||||
</P>
|
||||
<A NAME="toc28"></A>
|
||||
<A NAME="toc27"></A>
|
||||
<H3>Multiple inheritance</H3>
|
||||
<P>
|
||||
Specialized vocabularies can be represented as small grammars that
|
||||
only do "one thing" each, e.g.
|
||||
only do "one thing" each. For instance, the following are grammars
|
||||
for fish names and mushroom names.
|
||||
</P>
|
||||
<PRE>
|
||||
abstract Fish = {
|
||||
@@ -908,12 +913,12 @@ same time:
|
||||
<PRE>
|
||||
abstract Gatherer = Paleolithic, Fish, Mushrooms ** {
|
||||
fun
|
||||
UseFish : Fish -> CN ;
|
||||
UseMushroom : Mushroom -> CN ;
|
||||
FishCN : Fish -> CN ;
|
||||
MushroomCN : Mushroom -> CN ;
|
||||
}
|
||||
</PRE>
|
||||
<P></P>
|
||||
<A NAME="toc29"></A>
|
||||
<A NAME="toc28"></A>
|
||||
<H3>Visualizing module structure</H3>
|
||||
<P>
|
||||
When you have created all the abstract syntaxes and
|
||||
@@ -926,9 +931,28 @@ dependences look like, you can use the command
|
||||
> visualize_graph
|
||||
</PRE>
|
||||
<P>
|
||||
and the graph will pop up in a separate window. It can also
|
||||
be printed out into a file, e.g. a <CODE>.gif</CODE> file that
|
||||
can be included in an HTML document
|
||||
and the graph will pop up in a separate window.
|
||||
</P>
|
||||
<P>
|
||||
The graph uses
|
||||
</P>
|
||||
<UL>
|
||||
<LI>oval boxes for abstract modules
|
||||
<LI>square boxes for concrete modules
|
||||
<LI>black-headed arrows for inheritance
|
||||
<LI>white-headed arrows for the concrete-of-abstract relation
|
||||
<P></P>
|
||||
<IMG ALIGN="middle" SRC="Gatherer.gif" BORDER="0" ALT="">
|
||||
</UL>
|
||||
|
||||
<A NAME="toc29"></A>
|
||||
<H2>System commands</H2>
|
||||
<P>
|
||||
To document your grammar, you may want to print the
|
||||
graph into a file, e.g. a <CODE>.gif</CODE> file that
|
||||
can be included in an HTML document. You can do this
|
||||
by first printing the graph into a file <CODE>.dot</CODE> and then
|
||||
processing this file with the <CODE>dot</CODE> program.
|
||||
</P>
|
||||
<PRE>
|
||||
> pm -printer=graph | wf Gatherer.dot
|
||||
@@ -941,25 +965,147 @@ shell escape symbol <CODE>!</CODE>. The resulting graph is shown in the next sec
|
||||
<P>
|
||||
The command <CODE>print_multi = pm</CODE> is used for printing the current multilingual
|
||||
grammar in various formats, of which the format <CODE>-printer=graph</CODE> just
|
||||
shows the module dependencies.
|
||||
shows the module dependencies. Use the <CODE>help</CODE> to see what other formats
|
||||
are available:
|
||||
</P>
|
||||
<PRE>
|
||||
> help pm
|
||||
> help -printer
|
||||
</PRE>
|
||||
<P></P>
|
||||
<A NAME="toc30"></A>
|
||||
<H3>The module structure of ``GathererEng``</H3>
|
||||
<H2>Resource modules</H2>
|
||||
<A NAME="toc31"></A>
|
||||
<H3>The golden rule of functional programming</H3>
|
||||
<P>
|
||||
The graph uses
|
||||
In comparison to the <CODE>.cf</CODE> format, the <CODE>.gf</CODE> format still looks rather
|
||||
verbose, and demands lots more characters to be written. You have probably
|
||||
done this by the copy-paste-modify method, which is a standard way to
|
||||
avoid repeating work.
|
||||
</P>
|
||||
<P>
|
||||
However, there is a more elegant way to avoid repeating work than the copy-and-paste
|
||||
method. The <B>golden rule of functional programming</B> says that
|
||||
</P>
|
||||
<UL>
|
||||
<LI>oval boxes for abstract modules
|
||||
<LI>square boxes for concrete modules
|
||||
<LI>black-headed arrows for inheritance
|
||||
<LI>white-headed arrows for the concrete-of-abstract relation
|
||||
<LI>whenever you find yourself programming by copy-and-paste, write a function instead.
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
<img src="Gatherer.gif">
|
||||
A function separates the shared parts of different computations from the
|
||||
changing parts, parameters. In functional programming languages, such as
|
||||
<A HREF="http://www.haskell.org">Haskell</A>, it is possible to share muc more than in
|
||||
the languages such as C and Java.
|
||||
</P>
|
||||
<A NAME="toc31"></A>
|
||||
<H2>Resource modules</H2>
|
||||
<A NAME="toc32"></A>
|
||||
<H3>Operation definitions</H3>
|
||||
<P>
|
||||
GF is a functional programming language, not only in the sense that
|
||||
the abstract syntax is a system of functions (<CODE>fun</CODE>), but also because
|
||||
functional programming can be used to define concrete syntax. This is
|
||||
done by using a new form of judgement, with the keyword <CODE>oper</CODE> (for
|
||||
<B>operation</B>), distinct from <CODE>fun</CODE> for the sake of clarity.
|
||||
Here is a simple example of an operation:
|
||||
</P>
|
||||
<PRE>
|
||||
oper ss : Str -> {s : Str} = \x -> {s = x} ;
|
||||
</PRE>
|
||||
<P>
|
||||
The operation can be <B>applied</B> to an argument, and GF will
|
||||
<B>compute</B> the application into a value. For instance,
|
||||
</P>
|
||||
<PRE>
|
||||
ss "boy" ---> {s = "boy"}
|
||||
</PRE>
|
||||
<P>
|
||||
(We use the symbol <CODE>---></CODE> to indicate how an expression is
|
||||
computed into a value; this symbol is not a part of GF)
|
||||
</P>
|
||||
<P>
|
||||
Thus an <CODE>oper</CODE> judgement includes the name of the defined operation,
|
||||
its type, and an expression defining it. As for the syntax of the defining
|
||||
expression, notice the <B>lambda abstraction</B> form <CODE>\x -> t</CODE> of
|
||||
the function.
|
||||
</P>
|
||||
<A NAME="toc33"></A>
|
||||
<H3>The ``resource`` module type</H3>
|
||||
<P>
|
||||
Operator definitions can be included in a concrete syntax.
|
||||
But they are not really tied to a particular set of linearization rules.
|
||||
They should rather be seen as <B>resources</B>
|
||||
usable in many concrete syntaxes.
|
||||
</P>
|
||||
<P>
|
||||
The <CODE>resource</CODE> module type can be used to package
|
||||
<CODE>oper</CODE> definitions into reusable resources. Here is
|
||||
an example, with a handful of operations to manipulate
|
||||
strings and records.
|
||||
</P>
|
||||
<PRE>
|
||||
resource StringOper = {
|
||||
oper
|
||||
SS : Type = {s : Str} ;
|
||||
|
||||
ss : Str -> SS = \x -> {s = x} ;
|
||||
|
||||
cc : SS -> SS -> SS = \x,y -> ss (x.s ++ y.s) ;
|
||||
|
||||
prefix : Str -> SS -> SS = \p,x -> ss (p ++ x.s) ;
|
||||
}
|
||||
</PRE>
|
||||
<P>
|
||||
Resource modules can extend other resource modules, in the
|
||||
same way as modules of other types can extend modules of the
|
||||
same type. Thus it is possible to build resource hierarchies.
|
||||
</P>
|
||||
<A NAME="toc34"></A>
|
||||
<H3>Opening a ``resource``</H3>
|
||||
<P>
|
||||
Any number of <CODE>resource</CODE> modules can be
|
||||
<B>opened</B> in a <CODE>concrete</CODE> syntax, which
|
||||
makes definitions contained
|
||||
in the resource usable in the concrete syntax. Here is
|
||||
an example, where the resource <CODE>StringOper</CODE> is
|
||||
opened in a new version of <CODE>PaleolithicEng</CODE>.
|
||||
</P>
|
||||
<PRE>
|
||||
concrete PalEng of Paleolithic = open StringOper in {
|
||||
lincat
|
||||
S, NP, VP, CN, A, V, TV = SS ;
|
||||
lin
|
||||
PredVP = cc ;
|
||||
UseV v = v ;
|
||||
ComplTV = cc ;
|
||||
UseA = prefix "is" ;
|
||||
This = prefix "this" ;
|
||||
That = prefix "that" ;
|
||||
Def = prefix "the" ;
|
||||
Indef = prefix "a" ;
|
||||
ModA = cc ;
|
||||
Boy = ss "boy" ;
|
||||
Louse = ss "louse" ;
|
||||
Snake = ss "snake" ;
|
||||
-- etc
|
||||
}
|
||||
</PRE>
|
||||
<P>
|
||||
The same string operations could be use to write <CODE>PaleolithicIta</CODE>
|
||||
more concisely.
|
||||
</P>
|
||||
<A NAME="toc35"></A>
|
||||
<H3>Division of labour</H3>
|
||||
<P>
|
||||
Using operations defined in resource modules is a
|
||||
way to avoid repetitive code.
|
||||
In addition, it enables a new kind of modularity
|
||||
and division of labour in grammar writing: grammarians familiar with
|
||||
the linguistic details of a language can put this knowledge
|
||||
available through resource grammar modules, whose users only need
|
||||
to pick the right operations and not to know their implementation
|
||||
details.
|
||||
</P>
|
||||
<A NAME="toc36"></A>
|
||||
<H2>Morphology</H2>
|
||||
<P>
|
||||
Suppose we want to say, with the vocabulary included in
|
||||
<CODE>Paleolithic.gf</CODE>, things like
|
||||
@@ -985,15 +1131,16 @@ The introduction of plural forms requires two things:
|
||||
<P>
|
||||
Different languages have different rules of inflection and agreement.
|
||||
For instance, Italian has also agreement in gender (masculine vs. feminine).
|
||||
We want to express such special features of languages precisely in
|
||||
concrete syntax while ignoring them in abstract syntax.
|
||||
We want to express such special features of languages in the
|
||||
concrete syntax while ignoring them in the abstract syntax.
|
||||
</P>
|
||||
<P>
|
||||
To be able to do all this, we need two new judgement forms,
|
||||
a new module form, and a generalizarion of linearization types
|
||||
To be able to do all this, we need one new judgement form,
|
||||
many new expression forms,
|
||||
and a generalizarion of linearization types
|
||||
from strings to more complex types.
|
||||
</P>
|
||||
<A NAME="toc32"></A>
|
||||
<A NAME="toc37"></A>
|
||||
<H3>Parameters and tables</H3>
|
||||
<P>
|
||||
We define the <B>parameter type</B> of number in Englisn by
|
||||
@@ -1012,7 +1159,7 @@ with a type where the <CODE>s</CODE> field is a <B>table</B> depending on number
|
||||
</PRE>
|
||||
<P>
|
||||
The <B>table type</B> <CODE>Number => Str</CODE> is in many respects similar to
|
||||
a function type (<CODE>Number -> Str</CODE>). The main restriction is that the
|
||||
a function type (<CODE>Number -> Str</CODE>). The main difference is that the
|
||||
argument type of a table type must always be a parameter type. This means
|
||||
that the argument-value pairs can be listed in a finite table. The following
|
||||
example shows such a table:
|
||||
@@ -1034,7 +1181,7 @@ operator <CODE>!</CODE>. For instance,
|
||||
<P>
|
||||
is a selection, whose value is <CODE>"boys"</CODE>.
|
||||
</P>
|
||||
<A NAME="toc33"></A>
|
||||
<A NAME="toc38"></A>
|
||||
<H3>Inflection tables, paradigms, and ``oper`` definitions</H3>
|
||||
<P>
|
||||
All English common nouns are inflected in number, most of them in the
|
||||
@@ -1048,9 +1195,8 @@ From GF point of view, a paradigm is a function that takes a <B>lemma</B> -
|
||||
a string also known as a <B>dictionary form</B> - and returns an inflection
|
||||
table of desired type. Paradigms are not functions in the sense of the
|
||||
<CODE>fun</CODE> judgements of abstract syntax (which operate on trees and not
|
||||
on strings). Thus we call them <B>operations</B> for the sake of clarity,
|
||||
introduce one one form of judgement, with the keyword <CODE>oper</CODE>. As an
|
||||
example, the following operation defines the regular noun paradigm of English:
|
||||
on strings), but operations defined in <CODE>oper</CODE> judgements.
|
||||
The following operation defines the regular noun paradigm of English:
|
||||
</P>
|
||||
<PRE>
|
||||
oper regNoun : Str -> {s : Number => Str} = \x -> {
|
||||
@@ -1061,85 +1207,19 @@ example, the following operation defines the regular noun paradigm of English:
|
||||
} ;
|
||||
</PRE>
|
||||
<P>
|
||||
Thus an <CODE>oper</CODE> judgement includes the name of the defined operation,
|
||||
its type, and an expression defining it. As for the syntax of the defining
|
||||
expression, notice the <B>lambda abstraction</B> form <CODE>\x -> t</CODE> of
|
||||
the function, and the <B>glueing</B> operator <CODE>+</CODE> telling that
|
||||
The <B>glueing</B> operator <CODE>+</CODE> tells that
|
||||
the string held in the variable <CODE>x</CODE> and the ending <CODE>"s"</CODE>
|
||||
are written together to form one <B>token</B>.
|
||||
</P>
|
||||
<A NAME="toc34"></A>
|
||||
<H3>The ``resource`` module type</H3>
|
||||
<P>
|
||||
Parameter and operator definitions do not belong to the abstract syntax.
|
||||
They can be used when defining concrete syntax - but they are not
|
||||
tied to a particular set of linearization rules.
|
||||
The proper way to see them is as auxiliary concepts, as <B>resources</B>
|
||||
usable in many concrete syntaxes.
|
||||
</P>
|
||||
<P>
|
||||
The <CODE>resource</CODE> module type thus consists of
|
||||
<CODE>param</CODE> and <CODE>oper</CODE> definitions. Here is an
|
||||
example.
|
||||
are written together to form one <B>token</B>. Thus, for instance,
|
||||
</P>
|
||||
<PRE>
|
||||
resource MorphoEng = {
|
||||
param
|
||||
Number = Sg | Pl ;
|
||||
oper
|
||||
Noun : Type = {s : Number => Str} ;
|
||||
regNoun : Str -> Noun = \x -> {
|
||||
s = table {
|
||||
Sg => x ;
|
||||
Pl => x + "s"
|
||||
}
|
||||
} ;
|
||||
}
|
||||
(regNoun "boy").s ! Pl ---> "boy" + "s" ---> "boys"
|
||||
</PRE>
|
||||
<P>
|
||||
Resource modules can extend other resource modules, in the
|
||||
same way as modules of other types can extend modules of the
|
||||
same type.
|
||||
</P>
|
||||
<A NAME="toc35"></A>
|
||||
<H3>Opening a ``resource``</H3>
|
||||
<P>
|
||||
Any number of <CODE>resource</CODE> modules can be
|
||||
<B>opened</B> in a <CODE>concrete</CODE> syntax, which
|
||||
makes the parameter and operation definitions contained
|
||||
in the resource usable in the concrete syntax. Here is
|
||||
an example, where the resource <CODE>MorphoEng</CODE> is
|
||||
open in (the fragment of) a new version of <CODE>PaleolithicEng</CODE>.
|
||||
</P>
|
||||
<PRE>
|
||||
concrete PaleolithicEng of Paleolithic = open MorphoEng in {
|
||||
lincat
|
||||
CN = Noun ;
|
||||
lin
|
||||
Boy = regNoun "boy" ;
|
||||
Snake = regNoun "snake" ;
|
||||
Worm = regNoun "worm" ;
|
||||
}
|
||||
</PRE>
|
||||
<P>
|
||||
Notice that, just like in abstract syntax, function application
|
||||
is written by juxtaposition of the function and the argument.
|
||||
</P>
|
||||
<P>
|
||||
Using operations defined in resource modules is clearly a concise
|
||||
way of giving e.g. inflection tables and other repeated patterns
|
||||
of expression. In addition, it enables a new kind of modularity
|
||||
and division of labour in grammar writing: grammarians familiar with
|
||||
the linguistic details of a language can put this knowledge
|
||||
available through resource grammars, whose users only need
|
||||
to pick the right operations and not to know their implementation
|
||||
details.
|
||||
</P>
|
||||
<A NAME="toc36"></A>
|
||||
<P></P>
|
||||
<A NAME="toc39"></A>
|
||||
<H3>Worst-case macros and data abstraction</H3>
|
||||
<P>
|
||||
Some English nouns, such as <CODE>louse</CODE>, are so irregular that
|
||||
it makes little sense to see them as instances of a paradigm. Even
|
||||
it makes no sense to see them as instances of a paradigm. Even
|
||||
then, it is useful to perform <B>data abstraction</B> from the
|
||||
definition of the type <CODE>Noun</CODE>, and introduce a constructor
|
||||
operation, a <B>worst-case macro</B> for nouns:
|
||||
@@ -1159,6 +1239,13 @@ Thus we define
|
||||
lin Louse = mkNoun "louse" "lice" ;
|
||||
</PRE>
|
||||
<P>
|
||||
and
|
||||
</P>
|
||||
<PRE>
|
||||
oper regNoun : Str -> Noun = \x ->
|
||||
mkNoun x (x + "s") ;
|
||||
</PRE>
|
||||
<P>
|
||||
instead of writing the inflection table explicitly.
|
||||
</P>
|
||||
<P>
|
||||
@@ -1169,48 +1256,47 @@ interface (i.e. the system of type signatures) that makes it
|
||||
correct to use these functions in concrete modules. In programming
|
||||
terms, <CODE>Noun</CODE> is then treated as an <B>abstract datatype</B>.
|
||||
</P>
|
||||
<A NAME="toc37"></A>
|
||||
<A NAME="toc40"></A>
|
||||
<H3>A system of paradigms using ``Prelude`` operations</H3>
|
||||
<P>
|
||||
The regular noun paradigm <CODE>regNoun</CODE> can - and should - of course be defined
|
||||
by the worst-case macro <CODE>mkNoun</CODE>. In addition, some more noun paradigms
|
||||
could be defined, for instance,
|
||||
In addition to the completely regular noun paradigm <CODE>regNoun</CODE>,
|
||||
some other frequent noun paradigms deserve to be
|
||||
defined, for instance,
|
||||
</P>
|
||||
<PRE>
|
||||
regNoun : Str -> Noun = \snake -> mkNoun snake (snake + "s") ;
|
||||
sNoun : Str -> Noun = \kiss -> mkNoun kiss (kiss + "es") ;
|
||||
sNoun : Str -> Noun = \kiss -> mkNoun kiss (kiss + "es") ;
|
||||
</PRE>
|
||||
<P>
|
||||
What about nouns like <I>fly</I>, with the plural <I>flies</I>? The already
|
||||
available solution is to use the so-called "technical stem" <I>fl</I> as
|
||||
argument, and define
|
||||
available solution is to use the longest common prefix
|
||||
<I>fl</I> (also known as the <B>technical stem</B>) as argument, and define
|
||||
</P>
|
||||
<PRE>
|
||||
yNoun : Str -> Noun = \fl -> mkNoun (fl + "y") (fl + "ies") ;
|
||||
yNoun : Str -> Noun = \fl -> mkNoun (fl + "y") (fl + "ies") ;
|
||||
</PRE>
|
||||
<P>
|
||||
But this paradigm would be very unintuitive to use, because the "technical stem"
|
||||
is not even an existing form of the word. A better solution is to use
|
||||
the string operator <CODE>init</CODE>, which returns the initial segment (i.e.
|
||||
But this paradigm would be very unintuitive to use, because the technical stem
|
||||
is not an existing form of the word. A better solution is to use
|
||||
the lemma and a string operator <CODE>init</CODE>, which returns the initial segment (i.e.
|
||||
all characters but the last) of a string:
|
||||
</P>
|
||||
<PRE>
|
||||
yNoun : Str -> Noun = \fly -> mkNoun fly (init fly + "ies") ;
|
||||
yNoun : Str -> Noun = \fly -> mkNoun fly (init fly + "ies") ;
|
||||
</PRE>
|
||||
<P>
|
||||
The operator <CODE>init</CODE> belongs to a set of operations in the
|
||||
resource module <CODE>Prelude</CODE>, which therefore has to be
|
||||
<CODE>open</CODE>ed so that <CODE>init</CODE> can be used.
|
||||
</P>
|
||||
<A NAME="toc38"></A>
|
||||
<A NAME="toc41"></A>
|
||||
<H3>An intelligent noun paradigm using ``case`` expressions</H3>
|
||||
<P>
|
||||
It may be hard for the user of a resource morphology to pick the right
|
||||
inflection paradigm. A way to help this is to define a more intelligent
|
||||
paradigms, which chooses the ending by first analysing the lemma.
|
||||
paradigm, which chooses the ending by first analysing the lemma.
|
||||
The following variant for English regular nouns puts together all the
|
||||
previously shown paradigms, and chooses one of them on the basis of
|
||||
the final letter of the lemma.
|
||||
the final letter of the lemma (found by the prelude operator <CODE>last</CODE>).
|
||||
</P>
|
||||
<PRE>
|
||||
regNoun : Str -> Noun = \s -> case last s of {
|
||||
@@ -1221,7 +1307,7 @@ the final letter of the lemma.
|
||||
</PRE>
|
||||
<P>
|
||||
This definition displays many GF expression forms not shown befores;
|
||||
these forms are explained in the following section.
|
||||
these forms are explained in the next section.
|
||||
</P>
|
||||
<P>
|
||||
The paradigms <CODE>regNoun</CODE> does not give the correct forms for
|
||||
@@ -1232,7 +1318,7 @@ this, either use <CODE>mkNoun</CODE> or modify
|
||||
<CODE>regNoun</CODE> so that the <CODE>"y"</CODE> case does not
|
||||
apply if the second-last character is a vowel.
|
||||
</P>
|
||||
<A NAME="toc39"></A>
|
||||
<A NAME="toc42"></A>
|
||||
<H3>Pattern matching</H3>
|
||||
<P>
|
||||
Expressions of the <CODE>table</CODE> form are built from lists of
|
||||
@@ -1252,7 +1338,7 @@ then performed by <B>pattern matching</B>:
|
||||
|
||||
<P>
|
||||
Pattern matching is performed in the order in which the branches
|
||||
appear in the table.
|
||||
appear in the table: the branch of the first matching pattern is followed.
|
||||
</P>
|
||||
<P>
|
||||
As syntactic sugar, one-branch tables can be written concisely,
|
||||
@@ -1268,54 +1354,119 @@ programming languages are syntactic sugar for table selections:
|
||||
case e of {...} === table {...} ! e
|
||||
</PRE>
|
||||
<P></P>
|
||||
<A NAME="toc40"></A>
|
||||
<H3>Morphological analysis and morphology quiz</H3>
|
||||
<A NAME="toc43"></A>
|
||||
<H3>Morphological ``resource`` modules</H3>
|
||||
<P>
|
||||
Even though in GF morphology
|
||||
is mostly seen as an auxiliary of syntax, a morphology once defined
|
||||
can be used on its own right. The command <CODE>morpho_analyse = ma</CODE>
|
||||
can be used to read a text and return for each word the analyses that
|
||||
it has in the current concrete syntax.
|
||||
A common idiom is to
|
||||
gather the <CODE>oper</CODE> and <CODE>param</CODE> definitions
|
||||
needed for inflecting words in
|
||||
a language into a morphology module. Here is a simple
|
||||
example, <A HREF="MorphoEng.gf"><CODE>MorphoEng</CODE></A>.
|
||||
</P>
|
||||
<PRE>
|
||||
> rf bible.txt | morpho_analyse
|
||||
</PRE>
|
||||
<P>
|
||||
Similarly to translation exercises, morphological exercises can
|
||||
be generated, by the command <CODE>morpho_quiz = mq</CODE>. Usually,
|
||||
the category is set to be something else than <CODE>S</CODE>. For instance,
|
||||
</P>
|
||||
<PRE>
|
||||
> i lib/resource/french/VerbsFre.gf
|
||||
> morpho_quiz -cat=V
|
||||
--# -path=.:prelude
|
||||
|
||||
Welcome to GF Morphology Quiz.
|
||||
...
|
||||
resource MorphoEng = open Prelude in {
|
||||
|
||||
réapparaître : VFin VCondit Pl P2
|
||||
réapparaitriez
|
||||
> No, not réapparaitriez, but
|
||||
réapparaîtriez
|
||||
Score 0/1
|
||||
param
|
||||
Number = Sg | Pl ;
|
||||
|
||||
oper
|
||||
Noun, Verb : Type = {s : Number => Str} ;
|
||||
|
||||
mkNoun : Str -> Str -> Noun = \x,y -> {
|
||||
s = table {
|
||||
Sg => x ;
|
||||
Pl => y
|
||||
}
|
||||
} ;
|
||||
|
||||
regNoun : Str -> Noun = \s -> case last s of {
|
||||
"s" | "z" => mkNoun s (s + "es") ;
|
||||
"y" => mkNoun s (init s + "ies") ;
|
||||
_ => mkNoun s (s + "s")
|
||||
} ;
|
||||
|
||||
mkVerb : Str -> Str -> Verb = \x,y -> mkNoun y x ;
|
||||
|
||||
regVerb : Str -> Verb = \s -> case last s of {
|
||||
"s" | "z" => mkVerb s (s + "es") ;
|
||||
"y" => mkVerb s (init s + "ies") ;
|
||||
"o" => mkVerb s (s + "es") ;
|
||||
_ => mkVerb s (s + "s")
|
||||
} ;
|
||||
}
|
||||
</PRE>
|
||||
<P>
|
||||
Finally, a list of morphological exercises and save it in a
|
||||
file for later use, by the command <CODE>morpho_list = ml</CODE>
|
||||
The first line gives as a hint to the compiler the
|
||||
<B>search path</B> needed to find all the other modules that the
|
||||
module depends on. The directory <CODE>prelude</CODE> is a subdirectory of
|
||||
<CODE>GF/lib</CODE>; to be able to refer to it in this simple way, you can
|
||||
set the environment variable <CODE>GF_LIB_PATH</CODE> to point to this
|
||||
directory.
|
||||
</P>
|
||||
<A NAME="toc44"></A>
|
||||
<H3>Testing ``resource`` modules</H3>
|
||||
<P>
|
||||
To test a <CODE>resource</CODE> module independently, you can import it
|
||||
with a flag that tells GF to retain the <CODE>oper</CODE> definitions
|
||||
in the memory; the usual behaviour is that <CODE>oper</CODE> definitions
|
||||
are just applied to compile linearization rules
|
||||
(this is called <B>inlining</B>) and then thrown away.
|
||||
</P>
|
||||
<PRE>
|
||||
> morpho_list -number=25 -cat=V
|
||||
> i -retain MorphoEng.gf
|
||||
</PRE>
|
||||
<P></P>
|
||||
<P>
|
||||
The command <CODE>compute_concrete = cc</CODE> computes any expression
|
||||
formed by operations and other GF constructs. For example,
|
||||
</P>
|
||||
<PRE>
|
||||
> cc regVerb "echo"
|
||||
{s : Number => Str = table Number {
|
||||
Sg => "echoes" ;
|
||||
Pl => "echo"
|
||||
}
|
||||
}
|
||||
</PRE>
|
||||
<P></P>
|
||||
<P>
|
||||
The command <CODE>show_operations = so`</CODE> shows the type signatures
|
||||
of all operations returning a given value type:
|
||||
</P>
|
||||
<PRE>
|
||||
> so Verb
|
||||
MorphoEng.mkNoun : Str -> Str -> {s : {MorphoEng.Number} => Str}
|
||||
MorphoEng.mkVerb : Str -> Str -> {s : {MorphoEng.Number} => Str}
|
||||
MorphoEng.regNoun : Str -> {s : {MorphoEng.Number} => Str}
|
||||
MorphoEng.regVerb : Str -> { s : {MorphoEng.Number} => Str}
|
||||
</PRE>
|
||||
<P>
|
||||
The number flag gives the number of exercises generated.
|
||||
Why does the command also show the operations that form
|
||||
<CODE>Noun</CODE>s? The reason is that the type expression
|
||||
<CODE>Verb</CODE> is first computed, and its value happens to be
|
||||
the same as the value of <CODE>Noun</CODE>.
|
||||
</P>
|
||||
<A NAME="toc41"></A>
|
||||
<A NAME="toc45"></A>
|
||||
<H2>Using morphology in concrete syntax</H2>
|
||||
<P>
|
||||
We can now enrich the concrete syntax definitions to
|
||||
comprise morphology. This will involve a more radical
|
||||
variation between languages (e.g. English and Italian)
|
||||
then just the use of different words. In general,
|
||||
parameters and linearization types are different in
|
||||
different languages - but this does not prevent the
|
||||
use of a common abstract syntax.
|
||||
</P>
|
||||
<A NAME="toc46"></A>
|
||||
<H3>Parametric vs. inherent features, agreement</H3>
|
||||
<P>
|
||||
The rule of subject-verb agreement in English says that the verb
|
||||
phrase must be inflected in the number of the subject. This
|
||||
means that a noun phrase (functioning as a subject), in some sense
|
||||
<I>has</I> a number, which it "sends" to the verb. The verb does not
|
||||
have a number, but must be able to receive whatever number the
|
||||
means that a noun phrase (functioning as a subject), inherently
|
||||
<I>has</I> a number, which it passes to the verb. The verb does not
|
||||
<I>have</I> a number, but must be able to receive whatever number the
|
||||
subject has. This distinction is nicely represented by the
|
||||
different linearization types of noun phrases and verb phrases:
|
||||
</P>
|
||||
@@ -1335,7 +1486,7 @@ the predication structure:
|
||||
lin PredVP np vp = {s = np.s ++ vp.s ! np.n} ;
|
||||
</PRE>
|
||||
<P>
|
||||
The following page will present a new version of
|
||||
The following section will present a new version of
|
||||
<CODE>PaleolithingEng</CODE>, assuming an abstract syntax
|
||||
xextended with <CODE>All</CODE> and <CODE>Two</CODE>.
|
||||
It also assumes that <CODE>MorphoEng</CODE> has a paradigm
|
||||
@@ -1344,7 +1495,7 @@ regular only in the present tensse).
|
||||
The reader is invited to inspect the way in which agreement works in
|
||||
the formation of noun phrases and verb phrases.
|
||||
</P>
|
||||
<A NAME="toc42"></A>
|
||||
<A NAME="toc47"></A>
|
||||
<H3>English concrete syntax with parameters</H3>
|
||||
<PRE>
|
||||
concrete PaleolithicEng of Paleolithic = open MorphoEng in {
|
||||
@@ -1372,7 +1523,7 @@ the formation of noun phrases and verb phrases.
|
||||
}
|
||||
</PRE>
|
||||
<P></P>
|
||||
<A NAME="toc43"></A>
|
||||
<A NAME="toc48"></A>
|
||||
<H3>Hierarchic parameter types</H3>
|
||||
<P>
|
||||
The reader familiar with a functional programming language such as
|
||||
@@ -1414,7 +1565,47 @@ the adjectival paradigm in which the two singular forms are the same, can be def
|
||||
}
|
||||
</PRE>
|
||||
<P></P>
|
||||
<A NAME="toc44"></A>
|
||||
<A NAME="toc49"></A>
|
||||
<H3>Morphological analysis and morphology quiz</H3>
|
||||
<P>
|
||||
Even though in GF morphology
|
||||
is mostly seen as an auxiliary of syntax, a morphology once defined
|
||||
can be used on its own right. The command <CODE>morpho_analyse = ma</CODE>
|
||||
can be used to read a text and return for each word the analyses that
|
||||
it has in the current concrete syntax.
|
||||
</P>
|
||||
<PRE>
|
||||
> rf bible.txt | morpho_analyse
|
||||
</PRE>
|
||||
<P>
|
||||
In the same way as translation exercises, morphological exercises can
|
||||
be generated, by the command <CODE>morpho_quiz = mq</CODE>. Usually,
|
||||
the category is set to be something else than <CODE>S</CODE>. For instance,
|
||||
</P>
|
||||
<PRE>
|
||||
> i lib/resource/french/VerbsFre.gf
|
||||
> morpho_quiz -cat=V
|
||||
|
||||
Welcome to GF Morphology Quiz.
|
||||
...
|
||||
|
||||
réapparaître : VFin VCondit Pl P2
|
||||
réapparaitriez
|
||||
> No, not réapparaitriez, but
|
||||
réapparaîtriez
|
||||
Score 0/1
|
||||
</PRE>
|
||||
<P>
|
||||
Finally, a list of morphological exercises and save it in a
|
||||
file for later use, by the command <CODE>morpho_list = ml</CODE>
|
||||
</P>
|
||||
<PRE>
|
||||
> morpho_list -number=25 -cat=V
|
||||
</PRE>
|
||||
<P>
|
||||
The number flag gives the number of exercises generated.
|
||||
</P>
|
||||
<A NAME="toc50"></A>
|
||||
<H3>Discontinuous constituents</H3>
|
||||
<P>
|
||||
A linearization type may contain more strings than one.
|
||||
@@ -1439,27 +1630,31 @@ GF currently requires that all fields in linearization records that
|
||||
have a table with value type <CODE>Str</CODE> have as labels
|
||||
either <CODE>s</CODE> or <CODE>s</CODE> with an integer index.
|
||||
</P>
|
||||
<A NAME="toc45"></A>
|
||||
<H2>Topics still to be written</H2>
|
||||
<A NAME="toc46"></A>
|
||||
<H3>Free variation</H3>
|
||||
<A NAME="toc47"></A>
|
||||
<H3>Record extension, tuples</H3>
|
||||
<A NAME="toc48"></A>
|
||||
<H3>Predefined types and operations</H3>
|
||||
<A NAME="toc49"></A>
|
||||
<H3>Lexers and unlexers</H3>
|
||||
<A NAME="toc50"></A>
|
||||
<H3>Grammars of formal languages</H3>
|
||||
<A NAME="toc51"></A>
|
||||
<H3>Resource grammars and their reuse</H3>
|
||||
<H2>Topics still to be written</H2>
|
||||
<A NAME="toc52"></A>
|
||||
<H3>Embedded grammars in Haskell, Java, and Prolog</H3>
|
||||
<H3>Free variation</H3>
|
||||
<A NAME="toc53"></A>
|
||||
<H3>Dependent types, variable bindings, semantic definitions</H3>
|
||||
<H3>Record extension, tuples</H3>
|
||||
<A NAME="toc54"></A>
|
||||
<H3>Transfer modules</H3>
|
||||
<H3>Predefined types and operations</H3>
|
||||
<A NAME="toc55"></A>
|
||||
<H3>Lexers and unlexers</H3>
|
||||
<A NAME="toc56"></A>
|
||||
<H3>Grammars of formal languages</H3>
|
||||
<A NAME="toc57"></A>
|
||||
<H3>Resource grammars and their reuse</H3>
|
||||
<A NAME="toc58"></A>
|
||||
<H3>Interfaces, instances, and functors</H3>
|
||||
<A NAME="toc59"></A>
|
||||
<H3>Speech input and output</H3>
|
||||
<A NAME="toc60"></A>
|
||||
<H3>Embedded grammars in Haskell, Java, and Prolog</H3>
|
||||
<A NAME="toc61"></A>
|
||||
<H3>Dependent types, variable bindings, semantic definitions</H3>
|
||||
<A NAME="toc62"></A>
|
||||
<H3>Transfer modules</H3>
|
||||
<A NAME="toc63"></A>
|
||||
<H3>Alternative input and output grammar formats</H3>
|
||||
|
||||
<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) -->
|
||||
|
||||
@@ -523,7 +523,7 @@ in subsequent ``fun`` judgements.
|
||||
|
||||
Each category introduced in ``Paleolithic.gf`` is
|
||||
given a ``lincat`` rule, and each
|
||||
function is given a ``fun`` rule. Similar shorthands
|
||||
function is given a ``lin`` rule. Similar shorthands
|
||||
apply as in ``abstract`` modules.
|
||||
```
|
||||
concrete PaleolithicEng of Paleolithic = {
|
||||
@@ -576,12 +576,10 @@ The GF program does not only read the file
|
||||
``PaleolithicEng.gf``, but also all other files that it
|
||||
depends on - in this case, ``Paleolithic.gf``.
|
||||
|
||||
|
||||
|
||||
For each file that is compiled, a ``.gfc`` file
|
||||
is generated. The GFC format (="GF Canonical") is the
|
||||
"machine code" of GF, which is faster to process than
|
||||
GF source files. When reading a module, GF knows whether
|
||||
GF source files. When reading a module, GF decides whether
|
||||
to use an existing ``.gfc`` file or to generate
|
||||
a new one, by looking at modification times.
|
||||
|
||||
@@ -594,8 +592,6 @@ The main advantage of separating abstract from concrete syntax is that
|
||||
one abstract syntax can be equipped with many concrete syntaxes.
|
||||
A system with this property is called a **multilingual grammar**.
|
||||
|
||||
|
||||
|
||||
Multilingual grammars can be used for applications such as
|
||||
translation. Let us buid an Italian concrete syntax for
|
||||
``Paleolithic`` and then test the resulting
|
||||
@@ -641,7 +637,7 @@ lin
|
||||
%--!
|
||||
===Using a multilingual grammar===
|
||||
|
||||
Import without first emptying
|
||||
Import the two grammars in the same GF session.
|
||||
```
|
||||
> i PaleolithicEng.gf
|
||||
> i PaleolithicIta.gf
|
||||
@@ -659,7 +655,16 @@ Translate by using a pipe:
|
||||
> p -lang=PaleolithicEng "the boy eats the snake" | l -lang=PaleolithicIta
|
||||
il ragazzo mangia il serpente
|
||||
```
|
||||
|
||||
The ``lang`` flag tells GF which concrete syntax to use in parsing and
|
||||
linearization. By default, the flag is set to the last-imported grammar.
|
||||
To see what grammars are in scope and which is the main one, use the command
|
||||
``print_options = po``:
|
||||
```
|
||||
> print_options
|
||||
main abstract : Paleolithic
|
||||
main concrete : PaleolithicIta
|
||||
actual concretes : PaleolithicIta PaleolithicEng
|
||||
```
|
||||
|
||||
|
||||
%--!
|
||||
@@ -667,7 +672,7 @@ Translate by using a pipe:
|
||||
|
||||
This is a simple language exercise that can be automatically
|
||||
generated from a multilingual grammar. The system generates a set of
|
||||
random sentence, displays them in one language, and checks the user's
|
||||
random sentences, displays them in one language, and checks the user's
|
||||
answer given in another language. The command ``translation_quiz = tq``
|
||||
makes this in a subshell of GF.
|
||||
```
|
||||
@@ -690,31 +695,9 @@ file for later use, by the command ``translation_list = tl``
|
||||
```
|
||||
> translation_list -number=25 PaleolithicEng PaleolithicIta
|
||||
```
|
||||
The number flag gives the number of sentences generated.
|
||||
The ``number`` flag gives the number of sentences generated.
|
||||
|
||||
|
||||
%--!
|
||||
===The multilingual shell state===
|
||||
|
||||
A GF shell is at any time in a state, which
|
||||
contains a multilingual grammar. One of the concrete
|
||||
syntaxes is the "main" one, which means that parsing and linearization
|
||||
are performed by using it. By default, the main concrete syntax is the
|
||||
last-imported one. As we saw on previous slide, the ``lang`` flag
|
||||
can be used to change the linearization and parsing grammar.
|
||||
|
||||
|
||||
|
||||
To see what the multilingual grammar is (as well as some other
|
||||
things), you can use the command
|
||||
``print_options = po``:
|
||||
```
|
||||
> print_options
|
||||
main abstract : Paleolithic
|
||||
main concrete : PaleolithicIta
|
||||
all concretes : PaleolithicIta PaleolithicEng
|
||||
```
|
||||
|
||||
|
||||
%--!
|
||||
==Grammar architecture==
|
||||
@@ -723,7 +706,9 @@ things), you can use the command
|
||||
|
||||
The module system of GF makes it possible to **extend** a
|
||||
grammar in different ways. The syntax of extension is
|
||||
shown by the following example.
|
||||
shown by the following example. This is how language
|
||||
was extended when civilization advanced from the
|
||||
paleolithic to the neolithic age:
|
||||
```
|
||||
abstract Neolithic = Paleolithic ** {
|
||||
fun
|
||||
@@ -750,7 +735,8 @@ and extending module are put together.
|
||||
===Multiple inheritance===
|
||||
|
||||
Specialized vocabularies can be represented as small grammars that
|
||||
only do "one thing" each, e.g.
|
||||
only do "one thing" each. For instance, the following are grammars
|
||||
for fish names and mushroom names.
|
||||
```
|
||||
abstract Fish = {
|
||||
cat Fish ;
|
||||
@@ -768,8 +754,8 @@ same time:
|
||||
```
|
||||
abstract Gatherer = Paleolithic, Fish, Mushrooms ** {
|
||||
fun
|
||||
UseFish : Fish -> CN ;
|
||||
UseMushroom : Mushroom -> CN ;
|
||||
FishCN : Fish -> CN ;
|
||||
MushroomCN : Mushroom -> CN ;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -786,25 +772,7 @@ dependences look like, you can use the command
|
||||
```
|
||||
> visualize_graph
|
||||
```
|
||||
and the graph will pop up in a separate window. It can also
|
||||
be printed out into a file, e.g. a ``.gif`` file that
|
||||
can be included in an HTML document
|
||||
```
|
||||
> pm -printer=graph | wf Gatherer.dot
|
||||
> ! dot -Tgif Gatherer.dot > Gatherer.gif
|
||||
```
|
||||
The latter command is a Unix command, issued from GF by using the
|
||||
shell escape symbol ``!``. The resulting graph is shown in the next section.
|
||||
|
||||
|
||||
|
||||
The command ``print_multi = pm`` is used for printing the current multilingual
|
||||
grammar in various formats, of which the format ``-printer=graph`` just
|
||||
shows the module dependencies.
|
||||
|
||||
|
||||
%--!
|
||||
===The module structure of ``GathererEng``===
|
||||
and the graph will pop up in a separate window.
|
||||
|
||||
The graph uses
|
||||
|
||||
@@ -813,15 +781,166 @@ The graph uses
|
||||
- black-headed arrows for inheritance
|
||||
- white-headed arrows for the concrete-of-abstract relation
|
||||
|
||||
[Gatherer.gif]
|
||||
|
||||
|
||||
|
||||
<img src="Gatherer.gif">
|
||||
%--!
|
||||
==System commands==
|
||||
|
||||
To document your grammar, you may want to print the
|
||||
graph into a file, e.g. a ``.gif`` file that
|
||||
can be included in an HTML document. You can do this
|
||||
by first printing the graph into a file ``.dot`` and then
|
||||
processing this file with the ``dot`` program.
|
||||
```
|
||||
> pm -printer=graph | wf Gatherer.dot
|
||||
> ! dot -Tgif Gatherer.dot > Gatherer.gif
|
||||
```
|
||||
The latter command is a Unix command, issued from GF by using the
|
||||
shell escape symbol ``!``. The resulting graph is shown in the next section.
|
||||
|
||||
|
||||
The command ``print_multi = pm`` is used for printing the current multilingual
|
||||
grammar in various formats, of which the format ``-printer=graph`` just
|
||||
shows the module dependencies. Use the ``help`` to see what other formats
|
||||
are available:
|
||||
```
|
||||
> help pm
|
||||
> help -printer
|
||||
```
|
||||
|
||||
|
||||
%--!
|
||||
==Resource modules==
|
||||
|
||||
|
||||
===The golden rule of functional programming===
|
||||
|
||||
In comparison to the ``.cf`` format, the ``.gf`` format still looks rather
|
||||
verbose, and demands lots more characters to be written. You have probably
|
||||
done this by the copy-paste-modify method, which is a standard way to
|
||||
avoid repeating work.
|
||||
|
||||
However, there is a more elegant way to avoid repeating work than the copy-and-paste
|
||||
method. The **golden rule of functional programming** says that
|
||||
|
||||
- whenever you find yourself programming by copy-and-paste, write a function instead.
|
||||
|
||||
|
||||
A function separates the shared parts of different computations from the
|
||||
changing parts, parameters. In functional programming languages, such as
|
||||
[Haskell http://www.haskell.org], it is possible to share muc more than in
|
||||
the languages such as C and Java.
|
||||
|
||||
|
||||
===Operation definitions===
|
||||
|
||||
GF is a functional programming language, not only in the sense that
|
||||
the abstract syntax is a system of functions (``fun``), but also because
|
||||
functional programming can be used to define concrete syntax. This is
|
||||
done by using a new form of judgement, with the keyword ``oper`` (for
|
||||
**operation**), distinct from ``fun`` for the sake of clarity.
|
||||
Here is a simple example of an operation:
|
||||
```
|
||||
oper ss : Str -> {s : Str} = \x -> {s = x} ;
|
||||
```
|
||||
The operation can be **applied** to an argument, and GF will
|
||||
**compute** the application into a value. For instance,
|
||||
```
|
||||
ss "boy" ---> {s = "boy"}
|
||||
```
|
||||
(We use the symbol ``--->`` to indicate how an expression is
|
||||
computed into a value; this symbol is not a part of GF)
|
||||
|
||||
Thus an ``oper`` judgement includes the name of the defined operation,
|
||||
its type, and an expression defining it. As for the syntax of the defining
|
||||
expression, notice the **lambda abstraction** form ``\x -> t`` of
|
||||
the function.
|
||||
|
||||
|
||||
|
||||
%--!
|
||||
===The ``resource`` module type===
|
||||
|
||||
Operator definitions can be included in a concrete syntax.
|
||||
But they are not really tied to a particular set of linearization rules.
|
||||
They should rather be seen as **resources**
|
||||
usable in many concrete syntaxes.
|
||||
|
||||
The ``resource`` module type can be used to package
|
||||
``oper`` definitions into reusable resources. Here is
|
||||
an example, with a handful of operations to manipulate
|
||||
strings and records.
|
||||
```
|
||||
resource StringOper = {
|
||||
oper
|
||||
SS : Type = {s : Str} ;
|
||||
|
||||
ss : Str -> SS = \x -> {s = x} ;
|
||||
|
||||
cc : SS -> SS -> SS = \x,y -> ss (x.s ++ y.s) ;
|
||||
|
||||
prefix : Str -> SS -> SS = \p,x -> ss (p ++ x.s) ;
|
||||
}
|
||||
```
|
||||
Resource modules can extend other resource modules, in the
|
||||
same way as modules of other types can extend modules of the
|
||||
same type. Thus it is possible to build resource hierarchies.
|
||||
|
||||
|
||||
|
||||
%--!
|
||||
===Opening a ``resource``===
|
||||
|
||||
Any number of ``resource`` modules can be
|
||||
**opened** in a ``concrete`` syntax, which
|
||||
makes definitions contained
|
||||
in the resource usable in the concrete syntax. Here is
|
||||
an example, where the resource ``StringOper`` is
|
||||
opened in a new version of ``PaleolithicEng``.
|
||||
```
|
||||
concrete PalEng of Paleolithic = open StringOper in {
|
||||
lincat
|
||||
S, NP, VP, CN, A, V, TV = SS ;
|
||||
lin
|
||||
PredVP = cc ;
|
||||
UseV v = v ;
|
||||
ComplTV = cc ;
|
||||
UseA = prefix "is" ;
|
||||
This = prefix "this" ;
|
||||
That = prefix "that" ;
|
||||
Def = prefix "the" ;
|
||||
Indef = prefix "a" ;
|
||||
ModA = cc ;
|
||||
Boy = ss "boy" ;
|
||||
Louse = ss "louse" ;
|
||||
Snake = ss "snake" ;
|
||||
-- etc
|
||||
}
|
||||
```
|
||||
The same string operations could be use to write ``PaleolithicIta``
|
||||
more concisely.
|
||||
|
||||
|
||||
%--!
|
||||
===Division of labour===
|
||||
|
||||
Using operations defined in resource modules is a
|
||||
way to avoid repetitive code.
|
||||
In addition, it enables a new kind of modularity
|
||||
and division of labour in grammar writing: grammarians familiar with
|
||||
the linguistic details of a language can put this knowledge
|
||||
available through resource grammar modules, whose users only need
|
||||
to pick the right operations and not to know their implementation
|
||||
details.
|
||||
|
||||
|
||||
|
||||
|
||||
%--!
|
||||
==Morphology==
|
||||
|
||||
Suppose we want to say, with the vocabulary included in
|
||||
``Paleolithic.gf``, things like
|
||||
```
|
||||
@@ -832,8 +951,6 @@ The new grammatical facility we need are the plural forms
|
||||
of nouns and verbs (//boys, sleep//), as opposed to their
|
||||
singular forms.
|
||||
|
||||
|
||||
|
||||
The introduction of plural forms requires two things:
|
||||
|
||||
- to **inflect** nouns and verbs in singular and plural number
|
||||
@@ -841,16 +958,14 @@ The introduction of plural forms requires two things:
|
||||
rule that the verb must have the same number as the subject
|
||||
|
||||
|
||||
|
||||
Different languages have different rules of inflection and agreement.
|
||||
For instance, Italian has also agreement in gender (masculine vs. feminine).
|
||||
We want to express such special features of languages precisely in
|
||||
concrete syntax while ignoring them in abstract syntax.
|
||||
We want to express such special features of languages in the
|
||||
concrete syntax while ignoring them in the abstract syntax.
|
||||
|
||||
|
||||
|
||||
To be able to do all this, we need two new judgement forms,
|
||||
a new module form, and a generalizarion of linearization types
|
||||
To be able to do all this, we need one new judgement form,
|
||||
many new expression forms,
|
||||
and a generalizarion of linearization types
|
||||
from strings to more complex types.
|
||||
|
||||
|
||||
@@ -869,7 +984,7 @@ with a type where the ``s`` field is a **table** depending on number:
|
||||
lincat CN = {s : Number => Str} ;
|
||||
```
|
||||
The **table type** ``Number => Str`` is in many respects similar to
|
||||
a function type (``Number -> Str``). The main restriction is that the
|
||||
a function type (``Number -> Str``). The main difference is that the
|
||||
argument type of a table type must always be a parameter type. This means
|
||||
that the argument-value pairs can be listed in a finite table. The following
|
||||
example shows such a table:
|
||||
@@ -897,15 +1012,12 @@ ending //s//. This rule is an example of
|
||||
a **paradigm** - a formula telling how the inflection
|
||||
forms of a word are formed.
|
||||
|
||||
|
||||
|
||||
From GF point of view, a paradigm is a function that takes a **lemma** -
|
||||
a string also known as a **dictionary form** - and returns an inflection
|
||||
table of desired type. Paradigms are not functions in the sense of the
|
||||
``fun`` judgements of abstract syntax (which operate on trees and not
|
||||
on strings). Thus we call them **operations** for the sake of clarity,
|
||||
introduce one one form of judgement, with the keyword ``oper``. As an
|
||||
example, the following operation defines the regular noun paradigm of English:
|
||||
on strings), but operations defined in ``oper`` judgements.
|
||||
The following operation defines the regular noun paradigm of English:
|
||||
```
|
||||
oper regNoun : Str -> {s : Number => Str} = \x -> {
|
||||
s = table {
|
||||
@@ -914,80 +1026,12 @@ example, the following operation defines the regular noun paradigm of English:
|
||||
}
|
||||
} ;
|
||||
```
|
||||
Thus an ``oper`` judgement includes the name of the defined operation,
|
||||
its type, and an expression defining it. As for the syntax of the defining
|
||||
expression, notice the **lambda abstraction** form ``\x -> t`` of
|
||||
the function, and the **glueing** operator ``+`` telling that
|
||||
The **glueing** operator ``+`` tells that
|
||||
the string held in the variable ``x`` and the ending ``"s"``
|
||||
are written together to form one **token**.
|
||||
|
||||
|
||||
%--!
|
||||
===The ``resource`` module type===
|
||||
|
||||
Parameter and operator definitions do not belong to the abstract syntax.
|
||||
They can be used when defining concrete syntax - but they are not
|
||||
tied to a particular set of linearization rules.
|
||||
The proper way to see them is as auxiliary concepts, as **resources**
|
||||
usable in many concrete syntaxes.
|
||||
|
||||
|
||||
|
||||
The ``resource`` module type thus consists of
|
||||
``param`` and ``oper`` definitions. Here is an
|
||||
example.
|
||||
are written together to form one **token**. Thus, for instance,
|
||||
```
|
||||
resource MorphoEng = {
|
||||
param
|
||||
Number = Sg | Pl ;
|
||||
oper
|
||||
Noun : Type = {s : Number => Str} ;
|
||||
regNoun : Str -> Noun = \x -> {
|
||||
s = table {
|
||||
Sg => x ;
|
||||
Pl => x + "s"
|
||||
}
|
||||
} ;
|
||||
}
|
||||
(regNoun "boy").s ! Pl ---> "boy" + "s" ---> "boys"
|
||||
```
|
||||
Resource modules can extend other resource modules, in the
|
||||
same way as modules of other types can extend modules of the
|
||||
same type.
|
||||
|
||||
|
||||
|
||||
%--!
|
||||
===Opening a ``resource``===
|
||||
|
||||
Any number of ``resource`` modules can be
|
||||
**opened** in a ``concrete`` syntax, which
|
||||
makes the parameter and operation definitions contained
|
||||
in the resource usable in the concrete syntax. Here is
|
||||
an example, where the resource ``MorphoEng`` is
|
||||
open in (the fragment of) a new version of ``PaleolithicEng``.
|
||||
```
|
||||
concrete PaleolithicEng of Paleolithic = open MorphoEng in {
|
||||
lincat
|
||||
CN = Noun ;
|
||||
lin
|
||||
Boy = regNoun "boy" ;
|
||||
Snake = regNoun "snake" ;
|
||||
Worm = regNoun "worm" ;
|
||||
}
|
||||
```
|
||||
Notice that, just like in abstract syntax, function application
|
||||
is written by juxtaposition of the function and the argument.
|
||||
|
||||
|
||||
|
||||
Using operations defined in resource modules is clearly a concise
|
||||
way of giving e.g. inflection tables and other repeated patterns
|
||||
of expression. In addition, it enables a new kind of modularity
|
||||
and division of labour in grammar writing: grammarians familiar with
|
||||
the linguistic details of a language can put this knowledge
|
||||
available through resource grammars, whose users only need
|
||||
to pick the right operations and not to know their implementation
|
||||
details.
|
||||
|
||||
|
||||
|
||||
@@ -995,7 +1039,7 @@ details.
|
||||
===Worst-case macros and data abstraction===
|
||||
|
||||
Some English nouns, such as ``louse``, are so irregular that
|
||||
it makes little sense to see them as instances of a paradigm. Even
|
||||
it makes no sense to see them as instances of a paradigm. Even
|
||||
then, it is useful to perform **data abstraction** from the
|
||||
definition of the type ``Noun``, and introduce a constructor
|
||||
operation, a **worst-case macro** for nouns:
|
||||
@@ -1011,10 +1055,13 @@ Thus we define
|
||||
```
|
||||
lin Louse = mkNoun "louse" "lice" ;
|
||||
```
|
||||
and
|
||||
```
|
||||
oper regNoun : Str -> Noun = \x ->
|
||||
mkNoun x (x + "s") ;
|
||||
```
|
||||
instead of writing the inflection table explicitly.
|
||||
|
||||
|
||||
|
||||
The grammar engineering advantage of worst-case macros is that
|
||||
the author of the resource module may change the definitions of
|
||||
``Noun`` and ``mkNoun``, and still retain the
|
||||
@@ -1027,25 +1074,24 @@ terms, ``Noun`` is then treated as an **abstract datatype**.
|
||||
%--!
|
||||
===A system of paradigms using ``Prelude`` operations===
|
||||
|
||||
The regular noun paradigm ``regNoun`` can - and should - of course be defined
|
||||
by the worst-case macro ``mkNoun``. In addition, some more noun paradigms
|
||||
could be defined, for instance,
|
||||
In addition to the completely regular noun paradigm ``regNoun``,
|
||||
some other frequent noun paradigms deserve to be
|
||||
defined, for instance,
|
||||
```
|
||||
regNoun : Str -> Noun = \snake -> mkNoun snake (snake + "s") ;
|
||||
sNoun : Str -> Noun = \kiss -> mkNoun kiss (kiss + "es") ;
|
||||
sNoun : Str -> Noun = \kiss -> mkNoun kiss (kiss + "es") ;
|
||||
```
|
||||
What about nouns like //fly//, with the plural //flies//? The already
|
||||
available solution is to use the so-called "technical stem" //fl// as
|
||||
argument, and define
|
||||
available solution is to use the longest common prefix
|
||||
//fl// (also known as the **technical stem**) as argument, and define
|
||||
```
|
||||
yNoun : Str -> Noun = \fl -> mkNoun (fl + "y") (fl + "ies") ;
|
||||
yNoun : Str -> Noun = \fl -> mkNoun (fl + "y") (fl + "ies") ;
|
||||
```
|
||||
But this paradigm would be very unintuitive to use, because the "technical stem"
|
||||
is not even an existing form of the word. A better solution is to use
|
||||
the string operator ``init``, which returns the initial segment (i.e.
|
||||
But this paradigm would be very unintuitive to use, because the technical stem
|
||||
is not an existing form of the word. A better solution is to use
|
||||
the lemma and a string operator ``init``, which returns the initial segment (i.e.
|
||||
all characters but the last) of a string:
|
||||
```
|
||||
yNoun : Str -> Noun = \fly -> mkNoun fly (init fly + "ies") ;
|
||||
yNoun : Str -> Noun = \fly -> mkNoun fly (init fly + "ies") ;
|
||||
```
|
||||
The operator ``init`` belongs to a set of operations in the
|
||||
resource module ``Prelude``, which therefore has to be
|
||||
@@ -1058,10 +1104,10 @@ resource module ``Prelude``, which therefore has to be
|
||||
|
||||
It may be hard for the user of a resource morphology to pick the right
|
||||
inflection paradigm. A way to help this is to define a more intelligent
|
||||
paradigms, which chooses the ending by first analysing the lemma.
|
||||
paradigm, which chooses the ending by first analysing the lemma.
|
||||
The following variant for English regular nouns puts together all the
|
||||
previously shown paradigms, and chooses one of them on the basis of
|
||||
the final letter of the lemma.
|
||||
the final letter of the lemma (found by the prelude operator ``last``).
|
||||
```
|
||||
regNoun : Str -> Noun = \s -> case last s of {
|
||||
"s" | "z" => mkNoun s (s + "es") ;
|
||||
@@ -1070,9 +1116,7 @@ the final letter of the lemma.
|
||||
} ;
|
||||
```
|
||||
This definition displays many GF expression forms not shown befores;
|
||||
these forms are explained in the following section.
|
||||
|
||||
|
||||
these forms are explained in the next section.
|
||||
|
||||
The paradigms ``regNoun`` does not give the correct forms for
|
||||
all nouns. For instance, //louse - lice// and
|
||||
@@ -1101,11 +1145,8 @@ then performed by **pattern matching**:
|
||||
one of the disjuncts matches
|
||||
|
||||
|
||||
|
||||
Pattern matching is performed in the order in which the branches
|
||||
appear in the table.
|
||||
|
||||
|
||||
appear in the table: the branch of the first matching pattern is followed.
|
||||
|
||||
As syntactic sugar, one-branch tables can be written concisely,
|
||||
```
|
||||
@@ -1118,41 +1159,102 @@ programming languages are syntactic sugar for table selections:
|
||||
```
|
||||
|
||||
|
||||
%--!
|
||||
===Morphological ``resource`` modules===
|
||||
|
||||
A common idiom is to
|
||||
gather the ``oper`` and ``param`` definitions
|
||||
needed for inflecting words in
|
||||
a language into a morphology module. Here is a simple
|
||||
example, [``MorphoEng`` MorphoEng.gf].
|
||||
```
|
||||
--# -path=.:prelude
|
||||
|
||||
resource MorphoEng = open Prelude in {
|
||||
|
||||
param
|
||||
Number = Sg | Pl ;
|
||||
|
||||
oper
|
||||
Noun, Verb : Type = {s : Number => Str} ;
|
||||
|
||||
mkNoun : Str -> Str -> Noun = \x,y -> {
|
||||
s = table {
|
||||
Sg => x ;
|
||||
Pl => y
|
||||
}
|
||||
} ;
|
||||
|
||||
regNoun : Str -> Noun = \s -> case last s of {
|
||||
"s" | "z" => mkNoun s (s + "es") ;
|
||||
"y" => mkNoun s (init s + "ies") ;
|
||||
_ => mkNoun s (s + "s")
|
||||
} ;
|
||||
|
||||
mkVerb : Str -> Str -> Verb = \x,y -> mkNoun y x ;
|
||||
|
||||
regVerb : Str -> Verb = \s -> case last s of {
|
||||
"s" | "z" => mkVerb s (s + "es") ;
|
||||
"y" => mkVerb s (init s + "ies") ;
|
||||
"o" => mkVerb s (s + "es") ;
|
||||
_ => mkVerb s (s + "s")
|
||||
} ;
|
||||
}
|
||||
```
|
||||
The first line gives as a hint to the compiler the
|
||||
**search path** needed to find all the other modules that the
|
||||
module depends on. The directory ``prelude`` is a subdirectory of
|
||||
``GF/lib``; to be able to refer to it in this simple way, you can
|
||||
set the environment variable ``GF_LIB_PATH`` to point to this
|
||||
directory.
|
||||
|
||||
|
||||
%--!
|
||||
===Morphological analysis and morphology quiz===
|
||||
===Testing ``resource`` modules===
|
||||
|
||||
Even though in GF morphology
|
||||
is mostly seen as an auxiliary of syntax, a morphology once defined
|
||||
can be used on its own right. The command ``morpho_analyse = ma``
|
||||
can be used to read a text and return for each word the analyses that
|
||||
it has in the current concrete syntax.
|
||||
```
|
||||
> rf bible.txt | morpho_analyse
|
||||
```
|
||||
Similarly to translation exercises, morphological exercises can
|
||||
be generated, by the command ``morpho_quiz = mq``. Usually,
|
||||
the category is set to be something else than ``S``. For instance,
|
||||
```
|
||||
> i lib/resource/french/VerbsFre.gf
|
||||
> morpho_quiz -cat=V
|
||||
To test a ``resource`` module independently, you can import it
|
||||
with a flag that tells GF to retain the ``oper`` definitions
|
||||
in the memory; the usual behaviour is that ``oper`` definitions
|
||||
are just applied to compile linearization rules
|
||||
(this is called **inlining**) and then thrown away.
|
||||
|
||||
Welcome to GF Morphology Quiz.
|
||||
...
|
||||
``` > i -retain MorphoEng.gf
|
||||
|
||||
réapparaître : VFin VCondit Pl P2
|
||||
réapparaitriez
|
||||
> No, not réapparaitriez, but
|
||||
réapparaîtriez
|
||||
Score 0/1
|
||||
The command ``compute_concrete = cc`` computes any expression
|
||||
formed by operations and other GF constructs. For example,
|
||||
```
|
||||
Finally, a list of morphological exercises and save it in a
|
||||
file for later use, by the command ``morpho_list = ml``
|
||||
> cc regVerb "echo"
|
||||
{s : Number => Str = table Number {
|
||||
Sg => "echoes" ;
|
||||
Pl => "echo"
|
||||
}
|
||||
}
|
||||
```
|
||||
> morpho_list -number=25 -cat=V
|
||||
```
|
||||
The number flag gives the number of exercises generated.
|
||||
|
||||
The command ``show_operations = so``` shows the type signatures
|
||||
of all operations returning a given value type:
|
||||
```
|
||||
> so Verb
|
||||
MorphoEng.mkNoun : Str -> Str -> {s : {MorphoEng.Number} => Str}
|
||||
MorphoEng.mkVerb : Str -> Str -> {s : {MorphoEng.Number} => Str}
|
||||
MorphoEng.regNoun : Str -> {s : {MorphoEng.Number} => Str}
|
||||
MorphoEng.regVerb : Str -> { s : {MorphoEng.Number} => Str}
|
||||
```
|
||||
Why does the command also show the operations that form
|
||||
``Noun``s? The reason is that the type expression
|
||||
``Verb`` is first computed, and its value happens to be
|
||||
the same as the value of ``Noun``.
|
||||
|
||||
|
||||
==Using morphology in concrete syntax==
|
||||
|
||||
We can now enrich the concrete syntax definitions to
|
||||
comprise morphology. This will involve a more radical
|
||||
variation between languages (e.g. English and Italian)
|
||||
then just the use of different words. In general,
|
||||
parameters and linearization types are different in
|
||||
different languages - but this does not prevent the
|
||||
use of a common abstract syntax.
|
||||
|
||||
|
||||
%--!
|
||||
@@ -1160,9 +1262,9 @@ The number flag gives the number of exercises generated.
|
||||
|
||||
The rule of subject-verb agreement in English says that the verb
|
||||
phrase must be inflected in the number of the subject. This
|
||||
means that a noun phrase (functioning as a subject), in some sense
|
||||
//has// a number, which it "sends" to the verb. The verb does not
|
||||
have a number, but must be able to receive whatever number the
|
||||
means that a noun phrase (functioning as a subject), inherently
|
||||
//has// a number, which it passes to the verb. The verb does not
|
||||
//have// a number, but must be able to receive whatever number the
|
||||
subject has. This distinction is nicely represented by the
|
||||
different linearization types of noun phrases and verb phrases:
|
||||
```
|
||||
@@ -1179,7 +1281,7 @@ the predication structure:
|
||||
```
|
||||
lin PredVP np vp = {s = np.s ++ vp.s ! np.n} ;
|
||||
```
|
||||
The following page will present a new version of
|
||||
The following section will present a new version of
|
||||
``PaleolithingEng``, assuming an abstract syntax
|
||||
xextended with ``All`` and ``Two``.
|
||||
It also assumes that ``MorphoEng`` has a paradigm
|
||||
@@ -1189,7 +1291,6 @@ The reader is invited to inspect the way in which agreement works in
|
||||
the formation of noun phrases and verb phrases.
|
||||
|
||||
|
||||
|
||||
%--!
|
||||
===English concrete syntax with parameters===
|
||||
|
||||
@@ -1263,6 +1364,42 @@ the adjectival paradigm in which the two singular forms are the same, can be def
|
||||
```
|
||||
|
||||
|
||||
%--!
|
||||
===Morphological analysis and morphology quiz===
|
||||
|
||||
Even though in GF morphology
|
||||
is mostly seen as an auxiliary of syntax, a morphology once defined
|
||||
can be used on its own right. The command ``morpho_analyse = ma``
|
||||
can be used to read a text and return for each word the analyses that
|
||||
it has in the current concrete syntax.
|
||||
```
|
||||
> rf bible.txt | morpho_analyse
|
||||
```
|
||||
In the same way as translation exercises, morphological exercises can
|
||||
be generated, by the command ``morpho_quiz = mq``. Usually,
|
||||
the category is set to be something else than ``S``. For instance,
|
||||
```
|
||||
> i lib/resource/french/VerbsFre.gf
|
||||
> morpho_quiz -cat=V
|
||||
|
||||
Welcome to GF Morphology Quiz.
|
||||
...
|
||||
|
||||
réapparaître : VFin VCondit Pl P2
|
||||
réapparaitriez
|
||||
> No, not réapparaitriez, but
|
||||
réapparaîtriez
|
||||
Score 0/1
|
||||
```
|
||||
Finally, a list of morphological exercises and save it in a
|
||||
file for later use, by the command ``morpho_list = ml``
|
||||
```
|
||||
> morpho_list -number=25 -cat=V
|
||||
```
|
||||
The number flag gives the number of exercises generated.
|
||||
|
||||
|
||||
|
||||
%--!
|
||||
===Discontinuous constituents===
|
||||
|
||||
@@ -1319,6 +1456,8 @@ either ``s`` or ``s`` with an integer index.
|
||||
===Resource grammars and their reuse===
|
||||
|
||||
|
||||
===Interfaces, instances, and functors===
|
||||
|
||||
|
||||
===Speech input and output===
|
||||
|
||||
|
||||
Reference in New Issue
Block a user