forked from GitHub/gf-core
updated tutorial and resource howto
This commit is contained in:
@@ -7,7 +7,7 @@
|
||||
<P ALIGN="center"><CENTER><H1>Grammatical Framework Tutorial</H1>
|
||||
<FONT SIZE="4">
|
||||
<I>Author: Aarne Ranta <aarne (at) cs.chalmers.se></I><BR>
|
||||
Last update: Wed Jan 25 16:03:03 2006
|
||||
Last update: Fri Jun 16 01:02:28 2006
|
||||
</FONT></CENTER>
|
||||
|
||||
<P></P>
|
||||
@@ -34,7 +34,7 @@ Last update: Wed Jan 25 16:03:03 2006
|
||||
<LI><A HREF="#toc15">Labelled context-free grammars</A>
|
||||
<LI><A HREF="#toc16">The labelled context-free format</A>
|
||||
</UL>
|
||||
<LI><A HREF="#toc17">The ``.gf`` grammar format</A>
|
||||
<LI><A HREF="#toc17">The .gf grammar format</A>
|
||||
<UL>
|
||||
<LI><A HREF="#toc18">Abstract and concrete syntax</A>
|
||||
<LI><A HREF="#toc19">Judgement forms</A>
|
||||
@@ -70,8 +70,8 @@ Last update: Wed Jan 25 16:03:03 2006
|
||||
<UL>
|
||||
<LI><A HREF="#toc42">Parameters and tables</A>
|
||||
<LI><A HREF="#toc43">Inflection tables, paradigms, and ``oper`` definitions</A>
|
||||
<LI><A HREF="#toc44">Worst-case macros and data abstraction</A>
|
||||
<LI><A HREF="#toc45">A system of paradigms using ``Prelude`` operations</A>
|
||||
<LI><A HREF="#toc44">Worst-case functions and data abstraction</A>
|
||||
<LI><A HREF="#toc45">A system of paradigms using Prelude operations</A>
|
||||
<LI><A HREF="#toc46">An intelligent noun paradigm using ``case`` expressions</A>
|
||||
<LI><A HREF="#toc47">Pattern matching</A>
|
||||
<LI><A HREF="#toc48">Morphological ``resource`` modules</A>
|
||||
@@ -96,34 +96,41 @@ Last update: Wed Jan 25 16:03:03 2006
|
||||
<LI><A HREF="#toc63">Prefix-dependent choices</A>
|
||||
<LI><A HREF="#toc64">Predefined types and operations</A>
|
||||
</UL>
|
||||
<LI><A HREF="#toc65">More features of the module system</A>
|
||||
<LI><A HREF="#toc65">More concepts of abstract syntax</A>
|
||||
<UL>
|
||||
<LI><A HREF="#toc66">Interfaces, instances, and functors</A>
|
||||
<LI><A HREF="#toc67">Resource grammars and their reuse</A>
|
||||
<LI><A HREF="#toc68">Restricted inheritance and qualified opening</A>
|
||||
<LI><A HREF="#toc66">GF as a logical framework</A>
|
||||
<LI><A HREF="#toc67">Dependent types</A>
|
||||
<LI><A HREF="#toc68">Higher-order abstract syntax</A>
|
||||
<LI><A HREF="#toc69">Semantic definitions</A>
|
||||
<LI><A HREF="#toc70">List categories</A>
|
||||
</UL>
|
||||
<LI><A HREF="#toc69">More concepts of abstract syntax</A>
|
||||
<LI><A HREF="#toc71">More features of the module system</A>
|
||||
<UL>
|
||||
<LI><A HREF="#toc70">Dependent types</A>
|
||||
<LI><A HREF="#toc71">Higher-order abstract syntax</A>
|
||||
<LI><A HREF="#toc72">Semantic definitions</A>
|
||||
<LI><A HREF="#toc73">List categories</A>
|
||||
<LI><A HREF="#toc72">Interfaces, instances, and functors</A>
|
||||
<LI><A HREF="#toc73">Resource grammars and their reuse</A>
|
||||
<LI><A HREF="#toc74">Restricted inheritance and qualified opening</A>
|
||||
</UL>
|
||||
<LI><A HREF="#toc74">Transfer modules</A>
|
||||
<LI><A HREF="#toc75">Practical issues</A>
|
||||
<LI><A HREF="#toc75">Using the standard resource library</A>
|
||||
<UL>
|
||||
<LI><A HREF="#toc76">Lexers and unlexers</A>
|
||||
<LI><A HREF="#toc77">Efficiency of grammars</A>
|
||||
<LI><A HREF="#toc78">Speech input and output</A>
|
||||
<LI><A HREF="#toc79">Multilingual syntax editor</A>
|
||||
<LI><A HREF="#toc80">Interactive Development Environment (IDE)</A>
|
||||
<LI><A HREF="#toc81">Communicating with GF</A>
|
||||
<LI><A HREF="#toc82">Embedded grammars in Haskell, Java, and Prolog</A>
|
||||
<LI><A HREF="#toc83">Alternative input and output grammar formats</A>
|
||||
<LI><A HREF="#toc76">The simplest way</A>
|
||||
<LI><A HREF="#toc77">How to find resource functions</A>
|
||||
<LI><A HREF="#toc78">A functor implementation</A>
|
||||
</UL>
|
||||
<LI><A HREF="#toc84">Case studies</A>
|
||||
<LI><A HREF="#toc79">Transfer modules</A>
|
||||
<LI><A HREF="#toc80">Practical issues</A>
|
||||
<UL>
|
||||
<LI><A HREF="#toc85">Interfacing formal and natural languages</A>
|
||||
<LI><A HREF="#toc81">Lexers and unlexers</A>
|
||||
<LI><A HREF="#toc82">Efficiency of grammars</A>
|
||||
<LI><A HREF="#toc83">Speech input and output</A>
|
||||
<LI><A HREF="#toc84">Multilingual syntax editor</A>
|
||||
<LI><A HREF="#toc85">Interactive Development Environment (IDE)</A>
|
||||
<LI><A HREF="#toc86">Communicating with GF</A>
|
||||
<LI><A HREF="#toc87">Embedded grammars in Haskell, Java, and Prolog</A>
|
||||
<LI><A HREF="#toc88">Alternative input and output grammar formats</A>
|
||||
</UL>
|
||||
<LI><A HREF="#toc89">Case studies</A>
|
||||
<UL>
|
||||
<LI><A HREF="#toc90">Interfacing formal and natural languages</A>
|
||||
</UL>
|
||||
</UL>
|
||||
|
||||
@@ -222,7 +229,8 @@ These grammars can be used as <B>libraries</B> to define application grammars.
|
||||
In this way, it is possible to write a high-quality grammar without
|
||||
knowing about linguistics: in general, to write an application grammar
|
||||
by using the resource library just requires practical knowledge of
|
||||
the target language.
|
||||
the target language. and all theoretical knowledge about its grammar
|
||||
is given by the libraries.
|
||||
</P>
|
||||
<A NAME="toc4"></A>
|
||||
<H3>Who is this tutorial for</H3>
|
||||
@@ -258,9 +266,10 @@ notation (also known as BNF). The BNF format is often a good
|
||||
starting point for GF grammar development, because it is
|
||||
simple and widely used. However, the BNF format is not
|
||||
good for multilingual grammars. While it is possible to
|
||||
translate the words contained in a BNF grammar to another
|
||||
language, proper translation usually involves more, e.g.
|
||||
changing the word order in
|
||||
"translate" by just changing the words contained in a
|
||||
BNF grammar to words of some other
|
||||
language, proper translation usually involves more.
|
||||
For instance, the order of words may have to be changed:
|
||||
</P>
|
||||
<PRE>
|
||||
Italian cheese ===> formaggio italiano
|
||||
@@ -279,14 +288,14 @@ Italian adjectives usually have four forms where English
|
||||
has just one:
|
||||
</P>
|
||||
<PRE>
|
||||
delicious (wine | wines | pizza | pizzas)
|
||||
delicious (wine, wines, pizza, pizzas)
|
||||
vino delizioso, vini deliziosi, pizza deliziosa, pizze deliziose
|
||||
</PRE>
|
||||
<P>
|
||||
The <B>morphology</B> of a language describes the
|
||||
forms of its words. While the complete description of morphology
|
||||
belongs to resource grammars, the tutorial will explain the
|
||||
main programming concepts involved. This will moreover
|
||||
belongs to resource grammars, this tutorial will explain the
|
||||
programming concepts involved in morphology. This will moreover
|
||||
make it possible to grow the fragment covered by the food example.
|
||||
The tutorial will in fact build a toy resource grammar in order
|
||||
to illustrate the module structure of library-based application
|
||||
@@ -584,7 +593,7 @@ a sentence but a sequence of ten sentences.
|
||||
<H3>Labelled context-free grammars</H3>
|
||||
<P>
|
||||
The syntax trees returned by GF's parser in the previous examples
|
||||
are not so nice to look at. The identifiers of form <CODE>Mks</CODE>
|
||||
are not so nice to look at. The identifiers that form the tree
|
||||
are <B>labels</B> of the BNF rules. To see which label corresponds to
|
||||
which rule, you can use the <CODE>print_grammar = pg</CODE> command
|
||||
with the <CODE>printer</CODE> flag set to <CODE>cf</CODE> (which means context-free):
|
||||
@@ -631,7 +640,7 @@ labels to each rule.
|
||||
In files with the suffix <CODE>.cf</CODE>, you can prefix rules with
|
||||
labels that you provide yourself - these may be more useful
|
||||
than the automatically generated ones. The following is a possible
|
||||
labelling of <CODE>paleolithic.cf</CODE> with nicer-looking labels.
|
||||
labelling of <CODE>food.cf</CODE> with nicer-looking labels.
|
||||
</P>
|
||||
<PRE>
|
||||
Is. S ::= Item "is" Quality ;
|
||||
@@ -661,7 +670,7 @@ With this grammar, the trees look as follows:
|
||||
<IMG ALIGN="middle" SRC="Tree2.png" BORDER="0" ALT="">
|
||||
</P>
|
||||
<A NAME="toc17"></A>
|
||||
<H2>The ``.gf`` grammar format</H2>
|
||||
<H2>The .gf grammar format</H2>
|
||||
<P>
|
||||
To see what there is in GF's shell state when a grammar
|
||||
has been imported, you can give the plain command
|
||||
@@ -696,7 +705,7 @@ A GF grammar consists of two main parts:
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
The EBNF and CF formats fuse these two things together, but it is possible
|
||||
The CF format fuses these two things together, but it is possible
|
||||
to take them apart. For instance, the sentence formation rule
|
||||
</P>
|
||||
<PRE>
|
||||
@@ -773,7 +782,7 @@ judgement forms:
|
||||
<P>
|
||||
We return to the precise meanings of these judgement forms later.
|
||||
First we will look at how judgements are grouped into modules, and
|
||||
show how the paleolithic grammar is
|
||||
show how the food grammar is
|
||||
expressed by using modules and judgements.
|
||||
</P>
|
||||
<A NAME="toc20"></A>
|
||||
@@ -950,7 +959,7 @@ A system with this property is called a <B>multilingual grammar</B>.
|
||||
</P>
|
||||
<P>
|
||||
Multilingual grammars can be used for applications such as
|
||||
translation. Let us buid an Italian concrete syntax for
|
||||
translation. Let us build an Italian concrete syntax for
|
||||
<CODE>Food</CODE> and then test the resulting
|
||||
multilingual grammar.
|
||||
</P>
|
||||
@@ -1179,10 +1188,11 @@ The graph uses
|
||||
<LI>square boxes for concrete modules
|
||||
<LI>black-headed arrows for inheritance
|
||||
<LI>white-headed arrows for the concrete-of-abstract relation
|
||||
<P></P>
|
||||
<IMG ALIGN="middle" SRC="Foodmarket.png" BORDER="0" ALT="">
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
<IMG ALIGN="middle" SRC="Foodmarket.png" BORDER="0" ALT="">
|
||||
</P>
|
||||
<A NAME="toc34"></A>
|
||||
<H2>System commands</H2>
|
||||
<P>
|
||||
@@ -1203,7 +1213,7 @@ shell escape symbol <CODE>!</CODE>. The resulting graph was shown in the previou
|
||||
<P>
|
||||
The command <CODE>print_multi = pm</CODE> is used for printing the current multilingual
|
||||
grammar in various formats, of which the format <CODE>-printer=graph</CODE> just
|
||||
shows the module dependencies. Use the <CODE>help</CODE> to see what other formats
|
||||
shows the module dependencies. Use <CODE>help</CODE> to see what other formats
|
||||
are available:
|
||||
</P>
|
||||
<PRE>
|
||||
@@ -1216,9 +1226,9 @@ are available:
|
||||
<A NAME="toc36"></A>
|
||||
<H3>The golden rule of functional programming</H3>
|
||||
<P>
|
||||
In comparison to the <CODE>.cf</CODE> format, the <CODE>.gf</CODE> format still looks rather
|
||||
In comparison to the <CODE>.cf</CODE> format, the <CODE>.gf</CODE> format looks rather
|
||||
verbose, and demands lots more characters to be written. You have probably
|
||||
done this by the copy-paste-modify method, which is a standard way to
|
||||
done this by the copy-paste-modify method, which is a common way to
|
||||
avoid repeating work.
|
||||
</P>
|
||||
<P>
|
||||
@@ -1232,8 +1242,8 @@ method. The <B>golden rule of functional programming</B> says that
|
||||
<P>
|
||||
A function separates the shared parts of different computations from the
|
||||
changing parts, parameters. In functional programming languages, such as
|
||||
<A HREF="http://www.haskell.org">Haskell</A>, it is possible to share muc more than in
|
||||
the languages such as C and Java.
|
||||
<A HREF="http://www.haskell.org">Haskell</A>, it is possible to share much more than in
|
||||
languages such as C and Java.
|
||||
</P>
|
||||
<A NAME="toc37"></A>
|
||||
<H3>Operation definitions</H3>
|
||||
@@ -1283,11 +1293,8 @@ strings and records.
|
||||
resource StringOper = {
|
||||
oper
|
||||
SS : Type = {s : Str} ;
|
||||
|
||||
ss : Str -> SS = \x -> {s = x} ;
|
||||
|
||||
cc : SS -> SS -> SS = \x,y -> ss (x.s ++ y.s) ;
|
||||
|
||||
prefix : Str -> SS -> SS = \p,x -> ss (p ++ x.s) ;
|
||||
}
|
||||
</PRE>
|
||||
@@ -1433,7 +1440,7 @@ forms of a word are formed.
|
||||
</P>
|
||||
<P>
|
||||
From GF point of view, a paradigm is a function that takes a <B>lemma</B> -
|
||||
a string also known as a <B>dictionary form</B> - and returns an inflection
|
||||
also known as a <B>dictionary form</B> - and returns an inflection
|
||||
table of desired type. Paradigms are not functions in the sense of the
|
||||
<CODE>fun</CODE> judgements of abstract syntax (which operate on trees and not
|
||||
on strings), but operations defined in <CODE>oper</CODE> judgements.
|
||||
@@ -1457,13 +1464,13 @@ are written together to form one <B>token</B>. Thus, for instance,
|
||||
</PRE>
|
||||
<P></P>
|
||||
<A NAME="toc44"></A>
|
||||
<H3>Worst-case macros and data abstraction</H3>
|
||||
<H3>Worst-case functions and data abstraction</H3>
|
||||
<P>
|
||||
Some English nouns, such as <CODE>mouse</CODE>, are so irregular that
|
||||
it makes no sense to see them as instances of a paradigm. Even
|
||||
then, it is useful to perform <B>data abstraction</B> from the
|
||||
definition of the type <CODE>Noun</CODE>, and introduce a constructor
|
||||
operation, a <B>worst-case macro</B> for nouns:
|
||||
operation, a <B>worst-case function</B> for nouns:
|
||||
</P>
|
||||
<PRE>
|
||||
oper mkNoun : Str -> Str -> Noun = \x,y -> {
|
||||
@@ -1490,7 +1497,7 @@ and
|
||||
instead of writing the inflection table explicitly.
|
||||
</P>
|
||||
<P>
|
||||
The grammar engineering advantage of worst-case macros is that
|
||||
The grammar engineering advantage of worst-case functions is that
|
||||
the author of the resource module may change the definitions of
|
||||
<CODE>Noun</CODE> and <CODE>mkNoun</CODE>, and still retain the
|
||||
interface (i.e. the system of type signatures) that makes it
|
||||
@@ -1498,7 +1505,7 @@ correct to use these functions in concrete modules. In programming
|
||||
terms, <CODE>Noun</CODE> is then treated as an <B>abstract datatype</B>.
|
||||
</P>
|
||||
<A NAME="toc45"></A>
|
||||
<H3>A system of paradigms using ``Prelude`` operations</H3>
|
||||
<H3>A system of paradigms using Prelude operations</H3>
|
||||
<P>
|
||||
In addition to the completely regular noun paradigm <CODE>regNoun</CODE>,
|
||||
some other frequent noun paradigms deserve to be
|
||||
@@ -1707,7 +1714,7 @@ The rule of subject-verb agreement in English says that the verb
|
||||
phrase must be inflected in the number of the subject. This
|
||||
means that a noun phrase (functioning as a subject), inherently
|
||||
<I>has</I> a number, which it passes to the verb. The verb does not
|
||||
<I>have</I> a number, but must be able to receive whatever number the
|
||||
<I>have</I> a number, but must be able to <I>receive</I> whatever number the
|
||||
subject has. This distinction is nicely represented by the
|
||||
different linearization types of <B>noun phrases</B> and <B>verb phrases</B>:
|
||||
</P>
|
||||
@@ -1717,7 +1724,8 @@ different linearization types of <B>noun phrases</B> and <B>verb phrases</B>:
|
||||
</PRE>
|
||||
<P>
|
||||
We say that the number of <CODE>NP</CODE> is an <B>inherent feature</B>,
|
||||
whereas the number of <CODE>NP</CODE> is <B>parametric</B>.
|
||||
whereas the number of <CODE>NP</CODE> is a <B>variable feature</B> (or a
|
||||
<B>parametric feature</B>).
|
||||
</P>
|
||||
<P>
|
||||
The agreement rule itself is expressed in the linearization rule of
|
||||
@@ -1823,7 +1831,7 @@ Here is an example of pattern matching, the paradigm of regular adjectives.
|
||||
}
|
||||
</PRE>
|
||||
<P>
|
||||
A constructor can have patterns as arguments. For instance,
|
||||
A constructor can be used as a pattern that has patterns as arguments. For instance,
|
||||
the adjectival paradigm in which the two singular forms are the same,
|
||||
can be defined
|
||||
</P>
|
||||
@@ -1837,9 +1845,9 @@ can be defined
|
||||
<A NAME="toc54"></A>
|
||||
<H3>Morphological analysis and morphology quiz</H3>
|
||||
<P>
|
||||
Even though in GF morphology
|
||||
is mostly seen as an auxiliary of syntax, a morphology once defined
|
||||
can be used on its own right. The command <CODE>morpho_analyse = ma</CODE>
|
||||
Even though morphology is in GF
|
||||
mostly used as an auxiliary for syntax, it
|
||||
can also be useful on its own right. The command <CODE>morpho_analyse = ma</CODE>
|
||||
can be used to read a text and return for each word the analyses that
|
||||
it has in the current concrete syntax.
|
||||
</P>
|
||||
@@ -1865,11 +1873,12 @@ the category is set to be something else than <CODE>S</CODE>. For instance,
|
||||
Score 0/1
|
||||
</PRE>
|
||||
<P>
|
||||
Finally, a list of morphological exercises and save it in a
|
||||
Finally, a list of morphological exercises can be generated
|
||||
off-line saved in a
|
||||
file for later use, by the command <CODE>morpho_list = ml</CODE>
|
||||
</P>
|
||||
<PRE>
|
||||
> morpho_list -number=25 -cat=V
|
||||
> morpho_list -number=25 -cat=V | wf exx.txt
|
||||
</PRE>
|
||||
<P>
|
||||
The <CODE>number</CODE> flag gives the number of exercises generated.
|
||||
@@ -1884,25 +1893,36 @@ a sentence may place the object between the verb and the particle:
|
||||
<I>he switched it off</I>.
|
||||
</P>
|
||||
<P>
|
||||
The first of the following judgements defines transitive verbs as
|
||||
The following judgement defines transitive verbs as
|
||||
<B>discontinuous constituents</B>, i.e. as having a linearization
|
||||
type with two strings and not just one. The second judgement
|
||||
type with two strings and not just one.
|
||||
</P>
|
||||
<PRE>
|
||||
lincat TV = {s : Number => Str ; part : Str} ;
|
||||
</PRE>
|
||||
<P>
|
||||
This linearization rule
|
||||
shows how the constituents are separated by the object in complementization.
|
||||
</P>
|
||||
<PRE>
|
||||
lincat TV = {s : Number => Str ; part : Str} ;
|
||||
lin PredTV tv obj = {s = \\n => tv.s ! n ++ obj.s ++ tv.part} ;
|
||||
</PRE>
|
||||
<P>
|
||||
There is no restriction in the number of discontinuous constituents
|
||||
(or other fields) a <CODE>lincat</CODE> may contain. The only condition is that
|
||||
the fields must be of finite types, i.e. built from records, tables,
|
||||
parameters, and <CODE>Str</CODE>, and not functions. A mathematical result
|
||||
parameters, and <CODE>Str</CODE>, and not functions.
|
||||
</P>
|
||||
<P>
|
||||
A mathematical result
|
||||
about parsing in GF says that the worst-case complexity of parsing
|
||||
increases with the number of discontinuous constituents. Moreover,
|
||||
the parsing and linearization commands only give reliable results
|
||||
for categories whose linearization type has a unique <CODE>Str</CODE> valued
|
||||
field labelled <CODE>s</CODE>.
|
||||
increases with the number of discontinuous constituents. This is
|
||||
potentially a reason to avoid discontinuous constituents.
|
||||
Moreover, the parsing and linearization commands only give accurate
|
||||
results for categories whose linearization type has a unique <CODE>Str</CODE>
|
||||
valued field labelled <CODE>s</CODE>. Therefore, discontinuous constituents
|
||||
are not a good idea in top-level categories accessed by the users
|
||||
of a grammar application.
|
||||
</P>
|
||||
<A NAME="toc56"></A>
|
||||
<H2>More constructs for concrete syntax</H2>
|
||||
@@ -1953,8 +1973,25 @@ can be used e.g. if a word lacks a certain form.
|
||||
In general, <CODE>variants</CODE> should be used cautiously. It is not
|
||||
recommended for modules aimed to be libraries, because the
|
||||
user of the library has no way to choose among the variants.
|
||||
Moreover, even though <CODE>variants</CODE> admits lists of any type,
|
||||
its semantics for complex types can cause surprises.
|
||||
Moreover, <CODE>variants</CODE> is only defined for basic types (<CODE>Str</CODE>
|
||||
and parameter types). The grammar compiler will admit
|
||||
<CODE>variants</CODE> for any types, but it will push it to the
|
||||
level of basic types in a way that may be unwanted.
|
||||
For instance, German has two words meaning "car",
|
||||
<I>Wagen</I>, which is Masculine, and <I>Auto</I>, which is Neuter.
|
||||
However, if one writes
|
||||
</P>
|
||||
<PRE>
|
||||
variants {{s = "Wagen" ; g = Masc} ; {s = "Auto" ; g = Neutr}}
|
||||
</PRE>
|
||||
<P>
|
||||
this will compute to
|
||||
</P>
|
||||
<PRE>
|
||||
{s = variants {"Wagen" ; "Auto"} ; g = variants {Masc ; Neutr}}
|
||||
</PRE>
|
||||
<P>
|
||||
which will also accept erroneous combinations of strings and genders.
|
||||
</P>
|
||||
<A NAME="toc59"></A>
|
||||
<H3>Record extension and subtyping</H3>
|
||||
@@ -2039,9 +2076,6 @@ possible to write, slightly surprisingly,
|
||||
<A NAME="toc62"></A>
|
||||
<H3>Regular expression patterns</H3>
|
||||
<P>
|
||||
(New since 7 January 2006.)
|
||||
</P>
|
||||
<P>
|
||||
To define string operations computed at compile time, such
|
||||
as in morphology, it is handy to use regular expression patterns:
|
||||
</P>
|
||||
@@ -2076,7 +2110,6 @@ Another example: English noun plural formation.
|
||||
x + "y" => x + "ies" ;
|
||||
_ => w + "s"
|
||||
} ;
|
||||
|
||||
</PRE>
|
||||
<P>
|
||||
Semantics: variables are always bound to the <B>first match</B>, which is the first
|
||||
@@ -2085,8 +2118,10 @@ in the sequence of binding lists <CODE>Match p v</CODE> defined as follows. In t
|
||||
</P>
|
||||
<PRE>
|
||||
Match (p1|p2) v = Match p1 v ++ Match p2 v
|
||||
Match (p1+p2) s = [Match p1 s1 ++ Match p2 s2 | i <- [0..length s], (s1,s2) = splitAt i s]
|
||||
Match p* s = Match "" s ++ Match p s ++ Match (p + p) s ++ ...
|
||||
Match (p1+p2) s = [Match p1 s1 ++ Match p2 s2 |
|
||||
i <- [0..length s], (s1,s2) = splitAt i s]
|
||||
Match p* s = [[]] if Match "" s ++ Match p s ++ Match (p+p) s ++... /= []
|
||||
Match -p v = [[]] if Match p v = []
|
||||
Match c v = [[]] if c == v -- for constant and literal patterns c
|
||||
Match x v = [[(x,v)]] -- for variable patterns x
|
||||
Match x@p v = [[(x,v)]] + M if M = Match p v /= []
|
||||
@@ -2097,14 +2132,18 @@ Examples:
|
||||
</P>
|
||||
<UL>
|
||||
<LI><CODE>x + "e" + y</CODE> matches <CODE>"peter"</CODE> with <CODE>x = "p", y = "ter"</CODE>
|
||||
<LI><CODE>x@("foo"*)</CODE> matches any token with <CODE>x = ""</CODE>
|
||||
<LI><CODE>x + y@("er"*)</CODE> matches <CODE>"burgerer"</CODE> with <CODE>x = "burg", y = "erer"</CODE>
|
||||
<LI><CODE>x + "er"*</CODE> matches <CODE>"burgerer"</CODE> with ``x = "burg"
|
||||
</UL>
|
||||
|
||||
<A NAME="toc63"></A>
|
||||
<H3>Prefix-dependent choices</H3>
|
||||
<P>
|
||||
The construct exemplified in
|
||||
Sometimes a token has different forms depending on the token
|
||||
that follows. An example is the English indefinite article,
|
||||
which is <I>an</I> if a vowel follows, <I>a</I> otherwise.
|
||||
Which form is chosen can only be decided at run time, i.e.
|
||||
when a string is actually build. GF has a special construct for
|
||||
such tokens, the <CODE>pre</CODE> construct exemplified in
|
||||
</P>
|
||||
<PRE>
|
||||
oper artIndef : Str =
|
||||
@@ -2152,22 +2191,61 @@ they can be used as arguments. For example:
|
||||
|
||||
-- e.g. (StreetAddress 10 "Downing Street") : Address
|
||||
</PRE>
|
||||
<P></P>
|
||||
<P>
|
||||
The linearization type is <CODE>{s : Str}</CODE> for all these categories.
|
||||
</P>
|
||||
<A NAME="toc65"></A>
|
||||
<H2>More features of the module system</H2>
|
||||
<H2>More concepts of abstract syntax</H2>
|
||||
<A NAME="toc66"></A>
|
||||
<H3>Interfaces, instances, and functors</H3>
|
||||
<H3>GF as a logical framework</H3>
|
||||
<P>
|
||||
In this section, we will show how
|
||||
to encode advanced semantic concepts in an abstract syntax.
|
||||
We use concepts inherited from <B>type theory</B>. Type theory
|
||||
is the basis of many systems known as <B>logical frameworks</B>, which are
|
||||
used for representing mathematical theorems and their proofs on a computer.
|
||||
In fact, GF has a logical framework as its proper part:
|
||||
this part is the abstract syntax.
|
||||
</P>
|
||||
<P>
|
||||
In a logical framework, the formalization of a mathematical theory
|
||||
is a set of type and function declarations. The following is an example
|
||||
of such a theory, represented as an <CODE>abstract</CODE> module in GF.
|
||||
</P>
|
||||
<PRE>
|
||||
abstract Geometry = {
|
||||
cat
|
||||
Line ; Point ; Circle ; -- basic types of figures
|
||||
Prop ; -- proposition
|
||||
fun
|
||||
Parallel : Line -> Line -> Prop ; -- x is parallel to y
|
||||
Centre : Circle -> Point ; -- the centre of c
|
||||
}
|
||||
</PRE>
|
||||
<P></P>
|
||||
<A NAME="toc67"></A>
|
||||
<H3>Dependent types</H3>
|
||||
<A NAME="toc68"></A>
|
||||
<H3>Higher-order abstract syntax</H3>
|
||||
<A NAME="toc69"></A>
|
||||
<H3>Semantic definitions</H3>
|
||||
<A NAME="toc70"></A>
|
||||
<H3>List categories</H3>
|
||||
<A NAME="toc71"></A>
|
||||
<H2>More features of the module system</H2>
|
||||
<A NAME="toc72"></A>
|
||||
<H3>Interfaces, instances, and functors</H3>
|
||||
<A NAME="toc73"></A>
|
||||
<H3>Resource grammars and their reuse</H3>
|
||||
<P>
|
||||
A resource grammar is a grammar built on linguistic grounds,
|
||||
to describe a language rather than a domain.
|
||||
The GF resource grammar library contains resource grammars for
|
||||
The GF resource grammar library, which contains resource grammars for
|
||||
10 languages, is described more closely in the following
|
||||
documents:
|
||||
</P>
|
||||
<UL>
|
||||
<LI><A HREF="../../lib/resource/doc/gf-resource.html">Resource library API documentation</A>:
|
||||
<LI><A HREF="../../lib/resource-1.0/doc/">Resource library API documentation</A>:
|
||||
for application grammarians using the resource.
|
||||
<LI><A HREF="../../lib/resource-1.0/doc/Resource-HOWTO.html">Resource writing HOWTO</A>:
|
||||
for resource grammarians developing the resource.
|
||||
@@ -2177,21 +2255,41 @@ documents:
|
||||
However, to give a flavour of both using and writing resource grammars,
|
||||
we have created a miniature resource, which resides in the
|
||||
subdirectory <A HREF="resource"><CODE>resource</CODE></A>. Its API consists of the following
|
||||
modules:
|
||||
three modules:
|
||||
</P>
|
||||
<UL>
|
||||
<LI><A HREF="resource/Syntax.gf">Syntax</A>: syntactic structures, language-independent
|
||||
<LI><A HREF="resource/LexEng.gf">LexEng</A>: lexical paradigms, English
|
||||
<LI><A HREF="resource/LexIta.gf">LexIta</A>: lexical paradigms, Italian
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
<A HREF="resource/Syntax.gf">Syntax</A> - syntactic structures, language-independent:
|
||||
</P>
|
||||
<PRE>
|
||||
|
||||
</PRE>
|
||||
<P>
|
||||
<A HREF="resource/LexEng.gf">LexEng</A> - lexical paradigms, English:
|
||||
</P>
|
||||
<PRE>
|
||||
|
||||
</PRE>
|
||||
<P>
|
||||
<A HREF="resource/LexIta.gf">LexIta</A> - lexical paradigms, Italian:
|
||||
</P>
|
||||
<PRE>
|
||||
|
||||
</PRE>
|
||||
<P></P>
|
||||
<P>
|
||||
Only these three modules should be <CODE>open</CODE>ed in applications.
|
||||
The implementations of the resource are given in the following four modules:
|
||||
</P>
|
||||
<P>
|
||||
<A HREF="resource/MorphoEng.gf">MorphoEng</A>,
|
||||
</P>
|
||||
<PRE>
|
||||
|
||||
</PRE>
|
||||
<P>
|
||||
<A HREF="resource/MorphoIta.gf">MorphoIta</A>: low-level morphology
|
||||
</P>
|
||||
<UL>
|
||||
<LI><A HREF="resource/MorphoEng.gf">MorphoEng</A>,
|
||||
<A HREF="resource/MorphoIta.gf">MorphoIta</A>: low-level morphology
|
||||
<LI><A HREF="resource/SyntaxEng.gf">SyntaxEng</A>.
|
||||
<A HREF="resource/SyntaxIta.gf">SyntaxIta</A>: definitions of syntactic structures
|
||||
</UL>
|
||||
@@ -2210,19 +2308,181 @@ The rest of the modules (black) come from the resource.
|
||||
<P>
|
||||
<IMG ALIGN="middle" SRC="Multi.png" BORDER="0" ALT="">
|
||||
</P>
|
||||
<A NAME="toc68"></A>
|
||||
<H3>Restricted inheritance and qualified opening</H3>
|
||||
<A NAME="toc69"></A>
|
||||
<H2>More concepts of abstract syntax</H2>
|
||||
<A NAME="toc70"></A>
|
||||
<H3>Dependent types</H3>
|
||||
<A NAME="toc71"></A>
|
||||
<H3>Higher-order abstract syntax</H3>
|
||||
<A NAME="toc72"></A>
|
||||
<H3>Semantic definitions</H3>
|
||||
<A NAME="toc73"></A>
|
||||
<H3>List categories</H3>
|
||||
<A NAME="toc74"></A>
|
||||
<H3>Restricted inheritance and qualified opening</H3>
|
||||
<A NAME="toc75"></A>
|
||||
<H2>Using the standard resource library</H2>
|
||||
<P>
|
||||
The example files of this chapter can be found in
|
||||
the directory <A HREF="./arithm"><CODE>arithm</CODE></A>.
|
||||
</P>
|
||||
<A NAME="toc76"></A>
|
||||
<H3>The simplest way</H3>
|
||||
<P>
|
||||
The simplest way is to <CODE>open</CODE> a top-level <CODE>Lang</CODE> module
|
||||
and a <CODE>Paradigms</CODE> module:
|
||||
</P>
|
||||
<PRE>
|
||||
abstract Foo = ...
|
||||
|
||||
concrete FooEng = open LangEng, ParadigmsEng in ...
|
||||
concrete FooSwe = open LangSwe, ParadigmsSwe in ...
|
||||
</PRE>
|
||||
<P>
|
||||
Here is an example.
|
||||
</P>
|
||||
<PRE>
|
||||
abstract Arithm = {
|
||||
cat
|
||||
Prop ;
|
||||
Nat ;
|
||||
fun
|
||||
Zero : Nat ;
|
||||
Succ : Nat -> Nat ;
|
||||
Even : Nat -> Prop ;
|
||||
And : Prop -> Prop -> Prop ;
|
||||
}
|
||||
|
||||
--# -path=.:alltenses:prelude
|
||||
|
||||
concrete ArithmEng of Arithm = open LangEng, ParadigmsEng in {
|
||||
lincat
|
||||
Prop = S ;
|
||||
Nat = NP ;
|
||||
lin
|
||||
Zero =
|
||||
UsePN (regPN "zero" nonhuman) ;
|
||||
Succ n =
|
||||
DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 (regN2 "successor") n) ;
|
||||
Even n =
|
||||
UseCl TPres ASimul PPos
|
||||
(PredVP n (UseComp (CompAP (PositA (regA "even"))))) ;
|
||||
And x y =
|
||||
ConjS and_Conj (BaseS x y) ;
|
||||
|
||||
}
|
||||
|
||||
--# -path=.:alltenses:prelude
|
||||
|
||||
concrete ArithmSwe of Arithm = open LangSwe, ParadigmsSwe in {
|
||||
lincat
|
||||
Prop = S ;
|
||||
Nat = NP ;
|
||||
lin
|
||||
Zero =
|
||||
UsePN (regPN "noll" neutrum) ;
|
||||
Succ n =
|
||||
DetCN (DetSg (SgQuant DefArt) NoOrd)
|
||||
(ComplN2 (mkN2 (mk2N "efterföljare" "efterföljare")
|
||||
(mkPreposition "till")) n) ;
|
||||
Even n =
|
||||
UseCl TPres ASimul PPos
|
||||
(PredVP n (UseComp (CompAP (PositA (regA "jämn"))))) ;
|
||||
And x y =
|
||||
ConjS and_Conj (BaseS x y) ;
|
||||
}
|
||||
</PRE>
|
||||
<P></P>
|
||||
<A NAME="toc77"></A>
|
||||
<H3>How to find resource functions</H3>
|
||||
<P>
|
||||
The definitions in this example were found by parsing:
|
||||
</P>
|
||||
<PRE>
|
||||
> i LangEng.gf
|
||||
|
||||
-- for Successor:
|
||||
> p -cat=NP -mcfg -parser=topdown "the mother of Paris"
|
||||
|
||||
-- for Even:
|
||||
> p -cat=S -mcfg -parser=topdown "Paris is old"
|
||||
|
||||
-- for And:
|
||||
> p -cat=S -mcfg -parser=topdown "Paris is old and I am old"
|
||||
</PRE>
|
||||
<P>
|
||||
The use of parsing can be systematized by <B>example-based grammar writing</B>,
|
||||
to which we will return later.
|
||||
</P>
|
||||
<A NAME="toc78"></A>
|
||||
<H3>A functor implementation</H3>
|
||||
<P>
|
||||
The interesting thing now is that the
|
||||
code in <CODE>ArithmSwe</CODE> is similar to the code in <CODE>ArithmEng</CODE>, except for
|
||||
some lexical items ("noll" vs. "zero", "efterföljare" vs. "successor",
|
||||
"jämn" vs. "even"). How can we exploit the similarities and
|
||||
actually share code between the languages?
|
||||
</P>
|
||||
<P>
|
||||
The solution is to use a functor: an <CODE>incomplete</CODE> module that opens
|
||||
an <CODE>abstract</CODE> as an <CODE>interface</CODE>, and then instantiate it to different
|
||||
languages that implement the interface. The structure is as follows:
|
||||
</P>
|
||||
<PRE>
|
||||
abstract Foo ...
|
||||
|
||||
incomplete concrete FooI = open Lang, Lex in ...
|
||||
|
||||
concrete FooEng of Foo = FooI with (Lang=LangEng), (Lex=LexEng) ;
|
||||
concrete FooSwe of Foo = FooI with (Lang=LangSwe), (Lex=LexSwe) ;
|
||||
</PRE>
|
||||
<P>
|
||||
where <CODE>Lex</CODE> is an abstract lexicon that includes the vocabulary
|
||||
specific to this application:
|
||||
</P>
|
||||
<PRE>
|
||||
abstract Lex = Cat ** ...
|
||||
|
||||
concrete LexEng of Lex = CatEng ** open ParadigmsEng in ...
|
||||
concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in ...
|
||||
</PRE>
|
||||
<P>
|
||||
Here, again, a complete example (<CODE>abstract Arithm</CODE> is as above):
|
||||
</P>
|
||||
<PRE>
|
||||
incomplete concrete ArithmI of Arithm = open Lang, Lex in {
|
||||
lincat
|
||||
Prop = S ;
|
||||
Nat = NP ;
|
||||
lin
|
||||
Zero =
|
||||
UsePN zero_PN ;
|
||||
Succ n =
|
||||
DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 successor_N2 n) ;
|
||||
Even n =
|
||||
UseCl TPres ASimul PPos
|
||||
(PredVP n (UseComp (CompAP (PositA even_A)))) ;
|
||||
And x y =
|
||||
ConjS and_Conj (BaseS x y) ;
|
||||
}
|
||||
|
||||
--# -path=.:alltenses:prelude
|
||||
concrete ArithmEng of Arithm = ArithmI with
|
||||
(Lang = LangEng),
|
||||
(Lex = LexEng) ;
|
||||
|
||||
--# -path=.:alltenses:prelude
|
||||
concrete ArithmSwe of Arithm = ArithmI with
|
||||
(Lang = LangSwe),
|
||||
(Lex = LexSwe) ;
|
||||
|
||||
abstract Lex = Cat ** {
|
||||
fun
|
||||
zero_PN : PN ;
|
||||
successor_N2 : N2 ;
|
||||
even_A : A ;
|
||||
}
|
||||
|
||||
concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in {
|
||||
lin
|
||||
zero_PN = regPN "noll" neutrum ;
|
||||
successor_N2 =
|
||||
mkN2 (mk2N "efterföljare" "efterföljare") (mkPreposition "till") ;
|
||||
even_A = regA "jämn" ;
|
||||
}
|
||||
</PRE>
|
||||
<P></P>
|
||||
<A NAME="toc79"></A>
|
||||
<H2>Transfer modules</H2>
|
||||
<P>
|
||||
Transfer means noncompositional tree-transforming operations.
|
||||
@@ -2241,9 +2501,9 @@ See the
|
||||
<A HREF="../transfer.html">transfer language documentation</A>
|
||||
for more information.
|
||||
</P>
|
||||
<A NAME="toc75"></A>
|
||||
<A NAME="toc80"></A>
|
||||
<H2>Practical issues</H2>
|
||||
<A NAME="toc76"></A>
|
||||
<A NAME="toc81"></A>
|
||||
<H3>Lexers and unlexers</H3>
|
||||
<P>
|
||||
Lexers and unlexers can be chosen from
|
||||
@@ -2279,7 +2539,7 @@ Given by <CODE>help -lexer</CODE>, <CODE>help -unlexer</CODE>:
|
||||
|
||||
</PRE>
|
||||
<P></P>
|
||||
<A NAME="toc77"></A>
|
||||
<A NAME="toc82"></A>
|
||||
<H3>Efficiency of grammars</H3>
|
||||
<P>
|
||||
Issues:
|
||||
@@ -2290,7 +2550,7 @@ Issues:
|
||||
<LI>parsing efficiency: <CODE>-mcfg</CODE> vs. others
|
||||
</UL>
|
||||
|
||||
<A NAME="toc78"></A>
|
||||
<A NAME="toc83"></A>
|
||||
<H3>Speech input and output</H3>
|
||||
<P>
|
||||
The<CODE>speak_aloud = sa</CODE> command sends a string to the speech
|
||||
@@ -2320,7 +2580,7 @@ The method words only for grammars of English.
|
||||
Both Flite and ATK are freely available through the links
|
||||
above, but they are not distributed together with GF.
|
||||
</P>
|
||||
<A NAME="toc79"></A>
|
||||
<A NAME="toc84"></A>
|
||||
<H3>Multilingual syntax editor</H3>
|
||||
<P>
|
||||
The
|
||||
@@ -2337,12 +2597,12 @@ Here is a snapshot of the editor:
|
||||
The grammars of the snapshot are from the
|
||||
<A HREF="http://www.cs.chalmers.se/~aarne/GF/examples/letter">Letter grammar package</A>.
|
||||
</P>
|
||||
<A NAME="toc80"></A>
|
||||
<A NAME="toc85"></A>
|
||||
<H3>Interactive Development Environment (IDE)</H3>
|
||||
<P>
|
||||
Forthcoming.
|
||||
</P>
|
||||
<A NAME="toc81"></A>
|
||||
<A NAME="toc86"></A>
|
||||
<H3>Communicating with GF</H3>
|
||||
<P>
|
||||
Other processes can communicate with the GF command interpreter,
|
||||
@@ -2359,7 +2619,7 @@ Thus the most silent way to invoke GF is
|
||||
</PRE>
|
||||
</UL>
|
||||
|
||||
<A NAME="toc82"></A>
|
||||
<A NAME="toc87"></A>
|
||||
<H3>Embedded grammars in Haskell, Java, and Prolog</H3>
|
||||
<P>
|
||||
GF grammars can be used as parts of programs written in the
|
||||
@@ -2371,15 +2631,15 @@ following languages. The links give more documentation.
|
||||
<LI><A HREF="http://www.cs.chalmers.se/~peb/software.html">Prolog</A>
|
||||
</UL>
|
||||
|
||||
<A NAME="toc83"></A>
|
||||
<A NAME="toc88"></A>
|
||||
<H3>Alternative input and output grammar formats</H3>
|
||||
<P>
|
||||
A summary is given in the following chart of GF grammar compiler phases:
|
||||
<IMG ALIGN="middle" SRC="../gf-compiler.png" BORDER="0" ALT="">
|
||||
</P>
|
||||
<A NAME="toc84"></A>
|
||||
<A NAME="toc89"></A>
|
||||
<H2>Case studies</H2>
|
||||
<A NAME="toc85"></A>
|
||||
<A NAME="toc90"></A>
|
||||
<H3>Interfacing formal and natural languages</H3>
|
||||
<P>
|
||||
<A HREF="http://www.cs.chalmers.se/~krijo/thesis/thesisA4.pdf">Formal and Informal Software Specifications</A>,
|
||||
@@ -2392,6 +2652,6 @@ English and German.
|
||||
A simpler example will be explained here.
|
||||
</P>
|
||||
|
||||
<!-- html code generated by txt2tags 2.0 (http://txt2tags.sf.net) -->
|
||||
<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) -->
|
||||
<!-- cmdline: txt2tags -\-toc gf-tutorial2.txt -->
|
||||
</BODY></HTML>
|
||||
|
||||
@@ -7,9 +7,56 @@
|
||||
<P ALIGN="center"><CENTER><H1>Resource grammar writing HOWTO</H1>
|
||||
<FONT SIZE="4">
|
||||
<I>Author: Aarne Ranta <aarne (at) cs.chalmers.se></I><BR>
|
||||
Last update: Fri May 26 17:36:48 2006
|
||||
Last update: Fri Jun 16 00:59:52 2006
|
||||
</FONT></CENTER>
|
||||
|
||||
<P></P>
|
||||
<HR NOSHADE SIZE=1>
|
||||
<P></P>
|
||||
<UL>
|
||||
<LI><A HREF="#toc1">The resource grammar API</A>
|
||||
<UL>
|
||||
<LI><A HREF="#toc2">Phrase category modules</A>
|
||||
<LI><A HREF="#toc3">Infrastructure modules</A>
|
||||
<LI><A HREF="#toc4">Lexical modules</A>
|
||||
</UL>
|
||||
<LI><A HREF="#toc5">Language-dependent syntax modules</A>
|
||||
<LI><A HREF="#toc6">The core of the syntax</A>
|
||||
<UL>
|
||||
<LI><A HREF="#toc7">Another reduced API</A>
|
||||
<LI><A HREF="#toc8">The present-tense fragment</A>
|
||||
</UL>
|
||||
<LI><A HREF="#toc9">Phases of the work</A>
|
||||
<UL>
|
||||
<LI><A HREF="#toc10">Putting up a directory</A>
|
||||
<LI><A HREF="#toc11">Direction of work</A>
|
||||
<LI><A HREF="#toc12">The develop-test cycle</A>
|
||||
<LI><A HREF="#toc13">Resource modules used</A>
|
||||
<LI><A HREF="#toc14">Morphology and lexicon</A>
|
||||
<LI><A HREF="#toc15">Lock fields</A>
|
||||
<LI><A HREF="#toc16">Lexicon construction</A>
|
||||
</UL>
|
||||
<LI><A HREF="#toc17">Inside grammar modules</A>
|
||||
<UL>
|
||||
<LI><A HREF="#toc18">The category system</A>
|
||||
<LI><A HREF="#toc19">Phrase category modules</A>
|
||||
<LI><A HREF="#toc20">Resource modules</A>
|
||||
<LI><A HREF="#toc21">Lexicon</A>
|
||||
</UL>
|
||||
<LI><A HREF="#toc22">Lexicon extension</A>
|
||||
<UL>
|
||||
<LI><A HREF="#toc23">The irregularity lexicon</A>
|
||||
<LI><A HREF="#toc24">Lexicon extraction from a word list</A>
|
||||
<LI><A HREF="#toc25">Lexicon extraction from raw text data</A>
|
||||
<LI><A HREF="#toc26">Extending the resource grammar API</A>
|
||||
</UL>
|
||||
<LI><A HREF="#toc27">Writing an instance of parametrized resource grammar implementation</A>
|
||||
<LI><A HREF="#toc28">Parametrizing a resource grammar implementation</A>
|
||||
</UL>
|
||||
|
||||
<P></P>
|
||||
<HR NOSHADE SIZE=1>
|
||||
<P></P>
|
||||
<P>
|
||||
The purpose of this document is to tell how to implement the GF
|
||||
resource grammar API for a new language. We will <I>not</I> cover how
|
||||
@@ -17,23 +64,43 @@ to use the resource grammar, nor how to change the API. But we
|
||||
will give some hints how to extend the API.
|
||||
</P>
|
||||
<P>
|
||||
<B>Notice</B>. This document concerns the API v. 1.0 which has not
|
||||
yet been released. You can find the current code
|
||||
in <A HREF=".."><CODE>GF/lib/resource-1.0/</CODE></A>. See the
|
||||
<A HREF="../README"><CODE>resource-1.0/README</CODE></A> for
|
||||
A manual for using the resource grammar is found in
|
||||
</P>
|
||||
<P>
|
||||
<A HREF="../../../doc/resource.pdf"><CODE>http://www.cs.chalmers.se/~aarne/GF/doc/resource.pdf</CODE></A>.
|
||||
</P>
|
||||
<P>
|
||||
A tutorial on GF, also introducing the idea of resource grammars, is found in
|
||||
</P>
|
||||
<P>
|
||||
<A HREF="../../../doc/tutorial/gf-tutorial2.html"><CODE>http://www.cs.chalmers.se/~aarne/GF/doc/tutorial/gf-tutorial2.html</CODE></A>.
|
||||
</P>
|
||||
<P>
|
||||
This document concerns the API v. 1.0. You can find the current code in
|
||||
</P>
|
||||
<P>
|
||||
<A HREF=".."><CODE>http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/</CODE></A>
|
||||
</P>
|
||||
<P>
|
||||
See the <A HREF="../README"><CODE>README</CODE></A> for
|
||||
details on how this differs from previous versions.
|
||||
</P>
|
||||
<A NAME="toc1"></A>
|
||||
<H2>The resource grammar API</H2>
|
||||
<P>
|
||||
The API is divided into a bunch of <CODE>abstract</CODE> modules.
|
||||
The following figure gives the dependencies of these modules.
|
||||
</P>
|
||||
<P>
|
||||
<IMG ALIGN="left" SRC="Lang.png" BORDER="0" ALT="">
|
||||
<IMG ALIGN="left" SRC="Grammar.png" BORDER="0" ALT="">
|
||||
</P>
|
||||
<P>
|
||||
The module structure is rather flat: almost every module is a direct
|
||||
parent of the top module <CODE>Lang</CODE>. The idea
|
||||
Thus the API consists of a grammar and a lexicon, which is
|
||||
provided for test purposes.
|
||||
</P>
|
||||
<P>
|
||||
The module structure is rather flat: most modules are direct
|
||||
parents of <CODE>Grammar</CODE>. The idea
|
||||
is that you can concentrate on one linguistic aspect at a time, or
|
||||
also distribute the work among several authors. The module <CODE>Cat</CODE>
|
||||
defines the "glue" that ties the aspects together - a type system
|
||||
@@ -41,6 +108,7 @@ to which all the other modules conform, so that e.g. <CODE>NP</CODE> means
|
||||
the same thing in those modules that use <CODE>NP</CODE>s and those that
|
||||
constructs them.
|
||||
</P>
|
||||
<A NAME="toc2"></A>
|
||||
<H3>Phrase category modules</H3>
|
||||
<P>
|
||||
The direct parents of the top will be called <B>phrase category modules</B>,
|
||||
@@ -65,6 +133,7 @@ one of a small number of different types). Thus we have
|
||||
<LI><CODE>Idiom</CODE>: idiomatic phrases such as existentials
|
||||
</UL>
|
||||
|
||||
<A NAME="toc3"></A>
|
||||
<H3>Infrastructure modules</H3>
|
||||
<P>
|
||||
Expressions of each phrase category are constructed in the corresponding
|
||||
@@ -93,6 +162,7 @@ can skip the <CODE>lincat</CODE> definition of a category and use the default
|
||||
<CODE>{s : Str}</CODE> until you need to change it to something else. In
|
||||
English, for instance, many categories do have this linearization type.
|
||||
</P>
|
||||
<A NAME="toc4"></A>
|
||||
<H3>Lexical modules</H3>
|
||||
<P>
|
||||
What is lexical and what is syntactic is not as clearcut in GF as in
|
||||
@@ -129,6 +199,45 @@ different languages on the level of a resource grammar. In other words,
|
||||
application grammars are likely to use the resource in different ways for
|
||||
different languages.
|
||||
</P>
|
||||
<A NAME="toc5"></A>
|
||||
<H2>Language-dependent syntax modules</H2>
|
||||
<P>
|
||||
In addition to the common API, there is room for language-dependent extensions
|
||||
of the resource. The top level of each languages looks as follows (with English as example):
|
||||
</P>
|
||||
<PRE>
|
||||
abstract English = Grammar, ExtraEngAbs, DictEngAbs
|
||||
</PRE>
|
||||
<P>
|
||||
where <CODE>ExtraEngAbs</CODE> is a collection of syntactic structures specific to English,
|
||||
and <CODE>DictEngAbs</CODE> is an English dictionary
|
||||
(at the moment, it consists of <CODE>IrregEngAbs</CODE>,
|
||||
the irregular verbs of English). Each of these language-specific grammars has
|
||||
the potential to grow into a full-scale grammar of the language. These grammar
|
||||
can also be used as libraries, but the possibility of using functors is lost.
|
||||
</P>
|
||||
<P>
|
||||
To give a better overview of language-specific structures,
|
||||
modules like <CODE>ExtraEngAbs</CODE>
|
||||
are built from a language-independent module <CODE>ExtraAbs</CODE>
|
||||
by restricted inheritance:
|
||||
</P>
|
||||
<PRE>
|
||||
abstract ExtraEngAbs = Extra [f,g,...]
|
||||
</PRE>
|
||||
<P>
|
||||
Thus any category and function in <CODE>Extra</CODE> may be shared by a subset of all
|
||||
languages. One can see this set-up as a matrix, which tells
|
||||
what <CODE>Extra</CODE> structures
|
||||
are implemented in what languages. For the common API in <CODE>Grammar</CODE>, the matrix
|
||||
is filled with 1's (everything is implemented in every language).
|
||||
</P>
|
||||
<P>
|
||||
In a minimal resource grammar implementation, the language-dependent
|
||||
extensions are just empty modules, but it is good to provide them for
|
||||
the sake of uniformity.
|
||||
</P>
|
||||
<A NAME="toc6"></A>
|
||||
<H2>The core of the syntax</H2>
|
||||
<P>
|
||||
Among all categories and functions, a handful are
|
||||
@@ -153,6 +262,7 @@ rules relate the categories to each other. It is intended to be a
|
||||
first approximation when designing the parameter system of a new
|
||||
language.
|
||||
</P>
|
||||
<A NAME="toc7"></A>
|
||||
<H3>Another reduced API</H3>
|
||||
<P>
|
||||
If you want to experiment with a small subset of the resource API first,
|
||||
@@ -161,6 +271,7 @@ try out the module
|
||||
explained in the
|
||||
<A HREF="http://www.cs.chalmers.se/~aarne/GF/doc/tutorial/gf-tutorial2.html">GF Tutorial</A>.
|
||||
</P>
|
||||
<A NAME="toc8"></A>
|
||||
<H3>The present-tense fragment</H3>
|
||||
<P>
|
||||
Some lines in the resource library are suffixed with the comment
|
||||
@@ -176,7 +287,9 @@ implementation. To compile a grammar with present-tense-only, use
|
||||
i -preproc=GF/lib/resource-1.0/mkPresent LangGer.gf
|
||||
</PRE>
|
||||
<P></P>
|
||||
<A NAME="toc9"></A>
|
||||
<H2>Phases of the work</H2>
|
||||
<A NAME="toc10"></A>
|
||||
<H3>Putting up a directory</H3>
|
||||
<P>
|
||||
Unless you are writing an instance of a parametrized implementation
|
||||
@@ -262,6 +375,7 @@ as e.g. <CODE>VerbGer</CODE>.
|
||||
<P>
|
||||
<IMG ALIGN="middle" SRC="German.png" BORDER="0" ALT="">
|
||||
</P>
|
||||
<A NAME="toc11"></A>
|
||||
<H3>Direction of work</H3>
|
||||
<P>
|
||||
The real work starts now. There are many ways to proceed, the main ones being
|
||||
@@ -360,6 +474,7 @@ and dependences there are in your language, and you can now produce very
|
||||
much in the order you please.
|
||||
</OL>
|
||||
|
||||
<A NAME="toc12"></A>
|
||||
<H3>The develop-test cycle</H3>
|
||||
<P>
|
||||
The following develop-test cycle will
|
||||
@@ -416,6 +531,7 @@ follow soon. (You will found out that these explanations involve
|
||||
a rational reconstruction of the live process! Among other things, the
|
||||
API was changed during the actual process to make it more intuitive.)
|
||||
</P>
|
||||
<A NAME="toc13"></A>
|
||||
<H3>Resource modules used</H3>
|
||||
<P>
|
||||
These modules will be written by you.
|
||||
@@ -472,6 +588,7 @@ almost everything. This led in practice to the duplication of almost
|
||||
all code on the <CODE>lin</CODE> and <CODE>oper</CODE> levels, and made the code
|
||||
hard to understand and maintain.
|
||||
</P>
|
||||
<A NAME="toc14"></A>
|
||||
<H3>Morphology and lexicon</H3>
|
||||
<P>
|
||||
The paradigms needed to implement
|
||||
@@ -542,6 +659,7 @@ These constants are defined in terms of parameter types and constructors
|
||||
in <CODE>ResGer</CODE> and <CODE>MorphoGer</CODE>, which modules are not
|
||||
visible to the application grammarian.
|
||||
</P>
|
||||
<A NAME="toc15"></A>
|
||||
<H3>Lock fields</H3>
|
||||
<P>
|
||||
An important difference between <CODE>MorphoGer</CODE> and
|
||||
@@ -588,6 +706,7 @@ in her hidden definitions of constants in <CODE>Paradigms</CODE>. For instance,
|
||||
-- mkAdv s = {s = s ; lock_Adv = <>} ;
|
||||
</PRE>
|
||||
<P></P>
|
||||
<A NAME="toc16"></A>
|
||||
<H3>Lexicon construction</H3>
|
||||
<P>
|
||||
The lexicon belonging to <CODE>LangGer</CODE> consists of two modules:
|
||||
@@ -607,17 +726,20 @@ the coverage of the paradigms gets thereby tested and that the
|
||||
use of the paradigms in <CODE>LexiconGer</CODE> gives a good set of examples for
|
||||
those who want to build new lexica.
|
||||
</P>
|
||||
<A NAME="toc17"></A>
|
||||
<H2>Inside grammar modules</H2>
|
||||
<P>
|
||||
Detailed implementation tricks
|
||||
are found in the comments of each module.
|
||||
</P>
|
||||
<A NAME="toc18"></A>
|
||||
<H3>The category system</H3>
|
||||
<UL>
|
||||
<LI><A HREF="gfdoc/Common.html">Common</A>, <A HREF="../common/CommonX.gf">CommonX</A>
|
||||
<LI><A HREF="gfdoc/Cat.html">Cat</A>, <A HREF="gfdoc/CatGer.gf">CatGer</A>
|
||||
</UL>
|
||||
|
||||
<A NAME="toc19"></A>
|
||||
<H3>Phrase category modules</H3>
|
||||
<UL>
|
||||
<LI><A HREF="gfdoc/Noun.html">Noun</A>, <A HREF="../german/NounGer.gf">NounGer</A>
|
||||
@@ -635,6 +757,7 @@ are found in the comments of each module.
|
||||
<LI><A HREF="gfdoc/Lang.html">Lang</A>, <A HREF="../german/LangGer.gf">LangGer</A>
|
||||
</UL>
|
||||
|
||||
<A NAME="toc20"></A>
|
||||
<H3>Resource modules</H3>
|
||||
<UL>
|
||||
<LI><A HREF="../german/ResGer.gf">ResGer</A>
|
||||
@@ -642,13 +765,16 @@ are found in the comments of each module.
|
||||
<LI><A HREF="gfdoc/ParadigmsGer.html">ParadigmsGer</A>, <A HREF="../german/ParadigmsGer.gf">ParadigmsGer.gf</A>
|
||||
</UL>
|
||||
|
||||
<A NAME="toc21"></A>
|
||||
<H3>Lexicon</H3>
|
||||
<UL>
|
||||
<LI><A HREF="gfdoc/Structural.html">Structural</A>, <A HREF="../german/StructuralGer.gf">StructuralGer</A>
|
||||
<LI><A HREF="gfdoc/Lexicon.html">Lexicon</A>, <A HREF="../german/LexiconGer.gf">LexiconGer</A>
|
||||
</UL>
|
||||
|
||||
<A NAME="toc22"></A>
|
||||
<H2>Lexicon extension</H2>
|
||||
<A NAME="toc23"></A>
|
||||
<H3>The irregularity lexicon</H3>
|
||||
<P>
|
||||
It may be handy to provide a separate module of irregular
|
||||
@@ -658,6 +784,7 @@ few hundred perhaps. Building such a lexicon separately also
|
||||
makes it less important to cover <I>everything</I> by the
|
||||
worst-case paradigms (<CODE>mkV</CODE> etc).
|
||||
</P>
|
||||
<A NAME="toc24"></A>
|
||||
<H3>Lexicon extraction from a word list</H3>
|
||||
<P>
|
||||
You can often find resources such as lists of
|
||||
@@ -692,6 +819,7 @@ When using ready-made word lists, you should think about
|
||||
coyright issues. Ideally, all resource grammar material should
|
||||
be provided under GNU General Public License.
|
||||
</P>
|
||||
<A NAME="toc25"></A>
|
||||
<H3>Lexicon extraction from raw text data</H3>
|
||||
<P>
|
||||
This is a cheap technique to build a lexicon of thousands
|
||||
@@ -699,6 +827,7 @@ of words, if text data is available in digital format.
|
||||
See the <A HREF="http://www.cs.chalmers.se/~markus/FM/">Functional Morphology</A>
|
||||
homepage for details.
|
||||
</P>
|
||||
<A NAME="toc26"></A>
|
||||
<H3>Extending the resource grammar API</H3>
|
||||
<P>
|
||||
Sooner or later it will happen that the resource grammar API
|
||||
@@ -707,6 +836,7 @@ that it does not include idiomatic expressions in a given language.
|
||||
The solution then is in the first place to build language-specific
|
||||
extension modules. This chapter will deal with this issue (to be completed).
|
||||
</P>
|
||||
<A NAME="toc27"></A>
|
||||
<H2>Writing an instance of parametrized resource grammar implementation</H2>
|
||||
<P>
|
||||
Above we have looked at how a resource implementation is built by
|
||||
@@ -726,6 +856,7 @@ the Romance family (to be completed). Here is a set of
|
||||
<A HREF="http://www.cs.chalmers.se/~aarne/geocal2006.pdf">slides</A>
|
||||
on the topic.
|
||||
</P>
|
||||
<A NAME="toc28"></A>
|
||||
<H2>Parametrizing a resource grammar implementation</H2>
|
||||
<P>
|
||||
This is the most demanding form of resource grammar writing.
|
||||
@@ -742,5 +873,5 @@ is constructed from the Finnish grammar through parametrization.
|
||||
</P>
|
||||
|
||||
<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) -->
|
||||
<!-- cmdline: txt2tags Resource-HOWTO.txt -->
|
||||
<!-- cmdline: txt2tags -\-toc -thtml Resource-HOWTO.txt -->
|
||||
</BODY></HTML>
|
||||
|
||||
@@ -14,11 +14,19 @@ resource grammar API for a new language. We will //not// cover how
|
||||
to use the resource grammar, nor how to change the API. But we
|
||||
will give some hints how to extend the API.
|
||||
|
||||
A manual for using the resource grammar is found in
|
||||
|
||||
**Notice**. This document concerns the API v. 1.0 which has not
|
||||
yet been released. You can find the current code
|
||||
in [``GF/lib/resource-1.0/`` ..]. See the
|
||||
[``resource-1.0/README`` ../README] for
|
||||
[``http://www.cs.chalmers.se/~aarne/GF/doc/resource.pdf`` http://www.cs.chalmers.se/~aarne/GF/doc/resource.pdf].
|
||||
|
||||
A tutorial on GF, also introducing the idea of resource grammars, is found in
|
||||
|
||||
[``http://www.cs.chalmers.se/~aarne/GF/doc/tutorial/gf-tutorial2.html`` ../../../doc/tutorial/gf-tutorial2.html].
|
||||
|
||||
This document concerns the API v. 1.0. You can find the current code in
|
||||
|
||||
[``http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/`` ..]
|
||||
|
||||
See the [``README`` ../README] for
|
||||
details on how this differs from previous versions.
|
||||
|
||||
|
||||
@@ -28,10 +36,13 @@ details on how this differs from previous versions.
|
||||
The API is divided into a bunch of ``abstract`` modules.
|
||||
The following figure gives the dependencies of these modules.
|
||||
|
||||
[Lang.png]
|
||||
[Grammar.png]
|
||||
|
||||
The module structure is rather flat: almost every module is a direct
|
||||
parent of the top module ``Lang``. The idea
|
||||
Thus the API consists of a grammar and a lexicon, which is
|
||||
provided for test purposes.
|
||||
|
||||
The module structure is rather flat: most modules are direct
|
||||
parents of ``Grammar``. The idea
|
||||
is that you can concentrate on one linguistic aspect at a time, or
|
||||
also distribute the work among several authors. The module ``Cat``
|
||||
defines the "glue" that ties the aspects together - a type system
|
||||
@@ -127,6 +138,38 @@ application grammars are likely to use the resource in different ways for
|
||||
different languages.
|
||||
|
||||
|
||||
==Language-dependent syntax modules==
|
||||
|
||||
In addition to the common API, there is room for language-dependent extensions
|
||||
of the resource. The top level of each languages looks as follows (with English as example):
|
||||
```
|
||||
abstract English = Grammar, ExtraEngAbs, DictEngAbs
|
||||
```
|
||||
where ``ExtraEngAbs`` is a collection of syntactic structures specific to English,
|
||||
and ``DictEngAbs`` is an English dictionary
|
||||
(at the moment, it consists of ``IrregEngAbs``,
|
||||
the irregular verbs of English). Each of these language-specific grammars has
|
||||
the potential to grow into a full-scale grammar of the language. These grammar
|
||||
can also be used as libraries, but the possibility of using functors is lost.
|
||||
|
||||
To give a better overview of language-specific structures,
|
||||
modules like ``ExtraEngAbs``
|
||||
are built from a language-independent module ``ExtraAbs``
|
||||
by restricted inheritance:
|
||||
```
|
||||
abstract ExtraEngAbs = Extra [f,g,...]
|
||||
```
|
||||
Thus any category and function in ``Extra`` may be shared by a subset of all
|
||||
languages. One can see this set-up as a matrix, which tells
|
||||
what ``Extra`` structures
|
||||
are implemented in what languages. For the common API in ``Grammar``, the matrix
|
||||
is filled with 1's (everything is implemented in every language).
|
||||
|
||||
In a minimal resource grammar implementation, the language-dependent
|
||||
extensions are just empty modules, but it is good to provide them for
|
||||
the sake of uniformity.
|
||||
|
||||
|
||||
==The core of the syntax==
|
||||
|
||||
Among all categories and functions, a handful are
|
||||
|
||||
Reference in New Issue
Block a user