updated howto

This commit is contained in:
aarne
2006-05-26 15:36:54 +00:00
parent 3c8accaa6a
commit 8d571ffce4
2 changed files with 13 additions and 84 deletions

View File

@@ -7,55 +7,9 @@
<P ALIGN="center"><CENTER><H1>Resource grammar writing HOWTO</H1> <P ALIGN="center"><CENTER><H1>Resource grammar writing HOWTO</H1>
<FONT SIZE="4"> <FONT SIZE="4">
<I>Author: Aarne Ranta &lt;aarne (at) cs.chalmers.se&gt;</I><BR> <I>Author: Aarne Ranta &lt;aarne (at) cs.chalmers.se&gt;</I><BR>
Last update: Wed Mar 1 16:52:09 2006 Last update: Fri May 26 17:36:48 2006
</FONT></CENTER> </FONT></CENTER>
<P></P>
<HR NOSHADE SIZE=1>
<P></P>
<UL>
<LI><A HREF="#toc1">The resource grammar API</A>
<UL>
<LI><A HREF="#toc2">Phrase category modules</A>
<LI><A HREF="#toc3">Infrastructure modules</A>
<LI><A HREF="#toc4">Lexical modules</A>
</UL>
<LI><A HREF="#toc5">The core of the syntax</A>
<UL>
<LI><A HREF="#toc6">Another reduced API</A>
<LI><A HREF="#toc7">The present-tense fragment</A>
</UL>
<LI><A HREF="#toc8">Phases of the work</A>
<UL>
<LI><A HREF="#toc9">Putting up a directory</A>
<LI><A HREF="#toc10">Direction of work</A>
<LI><A HREF="#toc11">The develop-test cycle</A>
<LI><A HREF="#toc12">Resource modules used</A>
<LI><A HREF="#toc13">Morphology and lexicon</A>
<LI><A HREF="#toc14">Lock fields</A>
<LI><A HREF="#toc15">Lexicon construction</A>
</UL>
<LI><A HREF="#toc16">Inside grammar modules</A>
<UL>
<LI><A HREF="#toc17">The category system</A>
<LI><A HREF="#toc18">Phrase category modules</A>
<LI><A HREF="#toc19">Resource modules</A>
<LI><A HREF="#toc20">Lexicon</A>
</UL>
<LI><A HREF="#toc21">Lexicon extension</A>
<UL>
<LI><A HREF="#toc22">The irregularity lexicon</A>
<LI><A HREF="#toc23">Lexicon extraction from a word list</A>
<LI><A HREF="#toc24">Lexicon extraction from raw text data</A>
<LI><A HREF="#toc25">Extending the resource grammar API</A>
</UL>
<LI><A HREF="#toc26">Writing an instance of parametrized resource grammar implementation</A>
<LI><A HREF="#toc27">Parametrizing a resource grammar implementation</A>
</UL>
<P></P>
<HR NOSHADE SIZE=1>
<P></P>
<P> <P>
The purpose of this document is to tell how to implement the GF The purpose of this document is to tell how to implement the GF
resource grammar API for a new language. We will <I>not</I> cover how resource grammar API for a new language. We will <I>not</I> cover how
@@ -69,7 +23,6 @@ in <A HREF=".."><CODE>GF/lib/resource-1.0/</CODE></A>. See the
<A HREF="../README"><CODE>resource-1.0/README</CODE></A> for <A HREF="../README"><CODE>resource-1.0/README</CODE></A> for
details on how this differs from previous versions. details on how this differs from previous versions.
</P> </P>
<A NAME="toc1"></A>
<H2>The resource grammar API</H2> <H2>The resource grammar API</H2>
<P> <P>
The API is divided into a bunch of <CODE>abstract</CODE> modules. The API is divided into a bunch of <CODE>abstract</CODE> modules.
@@ -88,7 +41,6 @@ to which all the other modules conform, so that e.g. <CODE>NP</CODE> means
the same thing in those modules that use <CODE>NP</CODE>s and those that the same thing in those modules that use <CODE>NP</CODE>s and those that
constructs them. constructs them.
</P> </P>
<A NAME="toc2"></A>
<H3>Phrase category modules</H3> <H3>Phrase category modules</H3>
<P> <P>
The direct parents of the top will be called <B>phrase category modules</B>, The direct parents of the top will be called <B>phrase category modules</B>,
@@ -113,7 +65,6 @@ one of a small number of different types). Thus we have
<LI><CODE>Idiom</CODE>: idiomatic phrases such as existentials <LI><CODE>Idiom</CODE>: idiomatic phrases such as existentials
</UL> </UL>
<A NAME="toc3"></A>
<H3>Infrastructure modules</H3> <H3>Infrastructure modules</H3>
<P> <P>
Expressions of each phrase category are constructed in the corresponding Expressions of each phrase category are constructed in the corresponding
@@ -142,7 +93,6 @@ can skip the <CODE>lincat</CODE> definition of a category and use the default
<CODE>{s : Str}</CODE> until you need to change it to something else. In <CODE>{s : Str}</CODE> until you need to change it to something else. In
English, for instance, many categories do have this linearization type. English, for instance, many categories do have this linearization type.
</P> </P>
<A NAME="toc4"></A>
<H3>Lexical modules</H3> <H3>Lexical modules</H3>
<P> <P>
What is lexical and what is syntactic is not as clearcut in GF as in What is lexical and what is syntactic is not as clearcut in GF as in
@@ -179,7 +129,6 @@ different languages on the level of a resource grammar. In other words,
application grammars are likely to use the resource in different ways for application grammars are likely to use the resource in different ways for
different languages. different languages.
</P> </P>
<A NAME="toc5"></A>
<H2>The core of the syntax</H2> <H2>The core of the syntax</H2>
<P> <P>
Among all categories and functions, a handful are Among all categories and functions, a handful are
@@ -204,7 +153,6 @@ rules relate the categories to each other. It is intended to be a
first approximation when designing the parameter system of a new first approximation when designing the parameter system of a new
language. language.
</P> </P>
<A NAME="toc6"></A>
<H3>Another reduced API</H3> <H3>Another reduced API</H3>
<P> <P>
If you want to experiment with a small subset of the resource API first, If you want to experiment with a small subset of the resource API first,
@@ -213,7 +161,6 @@ try out the module
explained in the explained in the
<A HREF="http://www.cs.chalmers.se/~aarne/GF/doc/tutorial/gf-tutorial2.html">GF Tutorial</A>. <A HREF="http://www.cs.chalmers.se/~aarne/GF/doc/tutorial/gf-tutorial2.html">GF Tutorial</A>.
</P> </P>
<A NAME="toc7"></A>
<H3>The present-tense fragment</H3> <H3>The present-tense fragment</H3>
<P> <P>
Some lines in the resource library are suffixed with the comment Some lines in the resource library are suffixed with the comment
@@ -229,9 +176,7 @@ implementation. To compile a grammar with present-tense-only, use
i -preproc=GF/lib/resource-1.0/mkPresent LangGer.gf i -preproc=GF/lib/resource-1.0/mkPresent LangGer.gf
</PRE> </PRE>
<P></P> <P></P>
<A NAME="toc8"></A>
<H2>Phases of the work</H2> <H2>Phases of the work</H2>
<A NAME="toc9"></A>
<H3>Putting up a directory</H3> <H3>Putting up a directory</H3>
<P> <P>
Unless you are writing an instance of a parametrized implementation Unless you are writing an instance of a parametrized implementation
@@ -317,7 +262,6 @@ as e.g. <CODE>VerbGer</CODE>.
<P> <P>
<IMG ALIGN="middle" SRC="German.png" BORDER="0" ALT=""> <IMG ALIGN="middle" SRC="German.png" BORDER="0" ALT="">
</P> </P>
<A NAME="toc10"></A>
<H3>Direction of work</H3> <H3>Direction of work</H3>
<P> <P>
The real work starts now. There are many ways to proceed, the main ones being The real work starts now. There are many ways to proceed, the main ones being
@@ -416,7 +360,6 @@ and dependences there are in your language, and you can now produce very
much in the order you please. much in the order you please.
</OL> </OL>
<A NAME="toc11"></A>
<H3>The develop-test cycle</H3> <H3>The develop-test cycle</H3>
<P> <P>
The following develop-test cycle will The following develop-test cycle will
@@ -473,7 +416,6 @@ follow soon. (You will found out that these explanations involve
a rational reconstruction of the live process! Among other things, the a rational reconstruction of the live process! Among other things, the
API was changed during the actual process to make it more intuitive.) API was changed during the actual process to make it more intuitive.)
</P> </P>
<A NAME="toc12"></A>
<H3>Resource modules used</H3> <H3>Resource modules used</H3>
<P> <P>
These modules will be written by you. These modules will be written by you.
@@ -492,8 +434,9 @@ package.
</P> </P>
<UL> <UL>
<LI><CODE>ParamX</CODE>: parameter types used in many languages <LI><CODE>ParamX</CODE>: parameter types used in many languages
<LI><CODE>CommonX</CODE>: implementation of the categories $Text$ and $Phr$, as well as of <LI><CODE>CommonX</CODE>: implementation of language-uniform categories
the logical tense, anteriority, and polarity parameters such as $Text$ and $Phr$, as well as of
the logical tense, anteriority, and polarity parameters
<LI><CODE>Coordination</CODE>: operations to deal with lists and coordination <LI><CODE>Coordination</CODE>: operations to deal with lists and coordination
<LI><CODE>Prelude</CODE>: general-purpose operations on strings, records, <LI><CODE>Prelude</CODE>: general-purpose operations on strings, records,
truth values, etc. truth values, etc.
@@ -529,7 +472,6 @@ almost everything. This led in practice to the duplication of almost
all code on the <CODE>lin</CODE> and <CODE>oper</CODE> levels, and made the code all code on the <CODE>lin</CODE> and <CODE>oper</CODE> levels, and made the code
hard to understand and maintain. hard to understand and maintain.
</P> </P>
<A NAME="toc13"></A>
<H3>Morphology and lexicon</H3> <H3>Morphology and lexicon</H3>
<P> <P>
The paradigms needed to implement The paradigms needed to implement
@@ -600,7 +542,6 @@ These constants are defined in terms of parameter types and constructors
in <CODE>ResGer</CODE> and <CODE>MorphoGer</CODE>, which modules are not in <CODE>ResGer</CODE> and <CODE>MorphoGer</CODE>, which modules are not
visible to the application grammarian. visible to the application grammarian.
</P> </P>
<A NAME="toc14"></A>
<H3>Lock fields</H3> <H3>Lock fields</H3>
<P> <P>
An important difference between <CODE>MorphoGer</CODE> and An important difference between <CODE>MorphoGer</CODE> and
@@ -611,8 +552,8 @@ record types in a resource modules, such as <CODE>ParadigmsGer</CODE>,
a <B>lock field</B> is added to the record, so that categories a <B>lock field</B> is added to the record, so that categories
with the same implementation are not confused with each other. with the same implementation are not confused with each other.
(This is inspired by the <CODE>newtype</CODE> discipline in Haskell.) (This is inspired by the <CODE>newtype</CODE> discipline in Haskell.)
For instance, the lincats of adverbs and conjunctions may be the same For instance, the lincats of adverbs and conjunctions are the same
in <CODE>CatGer</CODE>: in <CODE>CommonX</CODE> (and therefore in <CODE>CatGer</CODE>, which inherits it):
</P> </P>
<PRE> <PRE>
lincat Adv = {s : Str} ; lincat Adv = {s : Str} ;
@@ -647,7 +588,6 @@ in her hidden definitions of constants in <CODE>Paradigms</CODE>. For instance,
-- mkAdv s = {s = s ; lock_Adv = &lt;&gt;} ; -- mkAdv s = {s = s ; lock_Adv = &lt;&gt;} ;
</PRE> </PRE>
<P></P> <P></P>
<A NAME="toc15"></A>
<H3>Lexicon construction</H3> <H3>Lexicon construction</H3>
<P> <P>
The lexicon belonging to <CODE>LangGer</CODE> consists of two modules: The lexicon belonging to <CODE>LangGer</CODE> consists of two modules:
@@ -667,20 +607,17 @@ the coverage of the paradigms gets thereby tested and that the
use of the paradigms in <CODE>LexiconGer</CODE> gives a good set of examples for use of the paradigms in <CODE>LexiconGer</CODE> gives a good set of examples for
those who want to build new lexica. those who want to build new lexica.
</P> </P>
<A NAME="toc16"></A>
<H2>Inside grammar modules</H2> <H2>Inside grammar modules</H2>
<P> <P>
Detailed implementation tricks Detailed implementation tricks
are found in the comments of each module. are found in the comments of each module.
</P> </P>
<A NAME="toc17"></A>
<H3>The category system</H3> <H3>The category system</H3>
<UL> <UL>
<LI><A HREF="gfdoc/Common.html">Common</A>, <A HREF="../common/CommonX.gf">CommonX</A> <LI><A HREF="gfdoc/Common.html">Common</A>, <A HREF="../common/CommonX.gf">CommonX</A>
<LI><A HREF="gfdoc/Cat.html">Cat</A>, <A HREF="gfdoc/CatGer.gf">CatGer</A> <LI><A HREF="gfdoc/Cat.html">Cat</A>, <A HREF="gfdoc/CatGer.gf">CatGer</A>
</UL> </UL>
<A NAME="toc18"></A>
<H3>Phrase category modules</H3> <H3>Phrase category modules</H3>
<UL> <UL>
<LI><A HREF="gfdoc/Noun.html">Noun</A>, <A HREF="../german/NounGer.gf">NounGer</A> <LI><A HREF="gfdoc/Noun.html">Noun</A>, <A HREF="../german/NounGer.gf">NounGer</A>
@@ -698,7 +635,6 @@ are found in the comments of each module.
<LI><A HREF="gfdoc/Lang.html">Lang</A>, <A HREF="../german/LangGer.gf">LangGer</A> <LI><A HREF="gfdoc/Lang.html">Lang</A>, <A HREF="../german/LangGer.gf">LangGer</A>
</UL> </UL>
<A NAME="toc19"></A>
<H3>Resource modules</H3> <H3>Resource modules</H3>
<UL> <UL>
<LI><A HREF="../german/ResGer.gf">ResGer</A> <LI><A HREF="../german/ResGer.gf">ResGer</A>
@@ -706,16 +642,13 @@ are found in the comments of each module.
<LI><A HREF="gfdoc/ParadigmsGer.html">ParadigmsGer</A>, <A HREF="../german/ParadigmsGer.gf">ParadigmsGer.gf</A> <LI><A HREF="gfdoc/ParadigmsGer.html">ParadigmsGer</A>, <A HREF="../german/ParadigmsGer.gf">ParadigmsGer.gf</A>
</UL> </UL>
<A NAME="toc20"></A>
<H3>Lexicon</H3> <H3>Lexicon</H3>
<UL> <UL>
<LI><A HREF="gfdoc/Structural.html">Structural</A>, <A HREF="../german/StructuralGer.gf">StructuralGer</A> <LI><A HREF="gfdoc/Structural.html">Structural</A>, <A HREF="../german/StructuralGer.gf">StructuralGer</A>
<LI><A HREF="gfdoc/Lexicon.html">Lexicon</A>, <A HREF="../german/LexiconGer.gf">LexiconGer</A> <LI><A HREF="gfdoc/Lexicon.html">Lexicon</A>, <A HREF="../german/LexiconGer.gf">LexiconGer</A>
</UL> </UL>
<A NAME="toc21"></A>
<H2>Lexicon extension</H2> <H2>Lexicon extension</H2>
<A NAME="toc22"></A>
<H3>The irregularity lexicon</H3> <H3>The irregularity lexicon</H3>
<P> <P>
It may be handy to provide a separate module of irregular It may be handy to provide a separate module of irregular
@@ -725,7 +658,6 @@ few hundred perhaps. Building such a lexicon separately also
makes it less important to cover <I>everything</I> by the makes it less important to cover <I>everything</I> by the
worst-case paradigms (<CODE>mkV</CODE> etc). worst-case paradigms (<CODE>mkV</CODE> etc).
</P> </P>
<A NAME="toc23"></A>
<H3>Lexicon extraction from a word list</H3> <H3>Lexicon extraction from a word list</H3>
<P> <P>
You can often find resources such as lists of You can often find resources such as lists of
@@ -760,7 +692,6 @@ When using ready-made word lists, you should think about
coyright issues. Ideally, all resource grammar material should coyright issues. Ideally, all resource grammar material should
be provided under GNU General Public License. be provided under GNU General Public License.
</P> </P>
<A NAME="toc24"></A>
<H3>Lexicon extraction from raw text data</H3> <H3>Lexicon extraction from raw text data</H3>
<P> <P>
This is a cheap technique to build a lexicon of thousands This is a cheap technique to build a lexicon of thousands
@@ -768,7 +699,6 @@ of words, if text data is available in digital format.
See the <A HREF="http://www.cs.chalmers.se/~markus/FM/">Functional Morphology</A> See the <A HREF="http://www.cs.chalmers.se/~markus/FM/">Functional Morphology</A>
homepage for details. homepage for details.
</P> </P>
<A NAME="toc25"></A>
<H3>Extending the resource grammar API</H3> <H3>Extending the resource grammar API</H3>
<P> <P>
Sooner or later it will happen that the resource grammar API Sooner or later it will happen that the resource grammar API
@@ -777,7 +707,6 @@ that it does not include idiomatic expressions in a given language.
The solution then is in the first place to build language-specific The solution then is in the first place to build language-specific
extension modules. This chapter will deal with this issue (to be completed). extension modules. This chapter will deal with this issue (to be completed).
</P> </P>
<A NAME="toc26"></A>
<H2>Writing an instance of parametrized resource grammar implementation</H2> <H2>Writing an instance of parametrized resource grammar implementation</H2>
<P> <P>
Above we have looked at how a resource implementation is built by Above we have looked at how a resource implementation is built by
@@ -797,7 +726,6 @@ the Romance family (to be completed). Here is a set of
<A HREF="http://www.cs.chalmers.se/~aarne/geocal2006.pdf">slides</A> <A HREF="http://www.cs.chalmers.se/~aarne/geocal2006.pdf">slides</A>
on the topic. on the topic.
</P> </P>
<A NAME="toc27"></A>
<H2>Parametrizing a resource grammar implementation</H2> <H2>Parametrizing a resource grammar implementation</H2>
<P> <P>
This is the most demanding form of resource grammar writing. This is the most demanding form of resource grammar writing.
@@ -813,6 +741,6 @@ This chapter will work out an example of how an Estonian grammar
is constructed from the Finnish grammar through parametrization. is constructed from the Finnish grammar through parametrization.
</P> </P>
<!-- html code generated by txt2tags 2.0 (http://txt2tags.sf.net) --> <!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) -->
<!-- cmdline: txt2tags -\-toc -thtml Resource-HOWTO.txt --> <!-- cmdline: txt2tags Resource-HOWTO.txt -->
</BODY></HTML> </BODY></HTML>

View File

@@ -422,8 +422,9 @@ These modules are language-independent and provided by the existing resource
package. package.
- ``ParamX``: parameter types used in many languages - ``ParamX``: parameter types used in many languages
- ``CommonX``: implementation of the categories $Text$ and $Phr$, as well as of - ``CommonX``: implementation of language-uniform categories
the logical tense, anteriority, and polarity parameters such as $Text$ and $Phr$, as well as of
the logical tense, anteriority, and polarity parameters
- ``Coordination``: operations to deal with lists and coordination - ``Coordination``: operations to deal with lists and coordination
- ``Prelude``: general-purpose operations on strings, records, - ``Prelude``: general-purpose operations on strings, records,
truth values, etc. truth values, etc.
@@ -533,8 +534,8 @@ record types in a resource modules, such as ``ParadigmsGer``,
a **lock field** is added to the record, so that categories a **lock field** is added to the record, so that categories
with the same implementation are not confused with each other. with the same implementation are not confused with each other.
(This is inspired by the ``newtype`` discipline in Haskell.) (This is inspired by the ``newtype`` discipline in Haskell.)
For instance, the lincats of adverbs and conjunctions may be the same For instance, the lincats of adverbs and conjunctions are the same
in ``CatGer``: in ``CommonX`` (and therefore in ``CatGer``, which inherits it):
``` ```
lincat Adv = {s : Str} ; lincat Adv = {s : Str} ;
lincat Conj = {s : Str} ; lincat Conj = {s : Str} ;