HOWTO update

This commit is contained in:
aarne
2006-01-05 22:20:35 +00:00
parent 92487d7dcf
commit a8a1f91e46
3 changed files with 83 additions and 77 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

View File

@@ -2,66 +2,54 @@
<HTML>
<HEAD>
<META NAME="generator" CONTENT="http://txt2tags.sf.net">
<TITLE>Resource grammar writing HOWTO</TITLE>
</HEAD><BODY BGCOLOR="white" TEXT="black">
<P ALIGN="center"><CENTER><H1>Resource grammar writing HOWTO</H1>
<FONT SIZE="4">
<I>Author: Aarne Ranta &lt;aarne (at) cs.chalmers.se&gt;</I><BR>
Last update: Thu Jan 5 23:19:40 2006
</FONT></CENTER>
<P></P>
<HR NOSHADE SIZE=1>
<P></P>
<UL>
<LI><A HREF="#toc1">HOW TO WRITE A RESOURCE GRAMMAR</A>
<UL>
<LI><A HREF="#toc2">The resource grammar API</A>
<LI><A HREF="#toc1">The resource grammar API</A>
<UL>
<LI><A HREF="#toc3">Phrase category modules</A>
<LI><A HREF="#toc4">Infrastructure modules</A>
<LI><A HREF="#toc5">Lexical modules</A>
<LI><A HREF="#toc2">Phrase category modules</A>
<LI><A HREF="#toc3">Infrastructure modules</A>
<LI><A HREF="#toc4">Lexical modules</A>
</UL>
<LI><A HREF="#toc6">Phases of the work</A>
<LI><A HREF="#toc5">Phases of the work</A>
<UL>
<LI><A HREF="#toc7">Putting up a directory</A>
<LI><A HREF="#toc8">The develop-test cycle</A>
<LI><A HREF="#toc9">Resource modules used</A>
<LI><A HREF="#toc10">Morphology and lexicon</A>
<LI><A HREF="#toc11">Lock fields</A>
<LI><A HREF="#toc12">Lexicon construction</A>
<LI><A HREF="#toc6">Putting up a directory</A>
<LI><A HREF="#toc7">The develop-test cycle</A>
<LI><A HREF="#toc8">Resource modules used</A>
<LI><A HREF="#toc9">Morphology and lexicon</A>
<LI><A HREF="#toc10">Lock fields</A>
<LI><A HREF="#toc11">Lexicon construction</A>
</UL>
<LI><A HREF="#toc13">Inside phrase category modules</A>
<LI><A HREF="#toc12">Inside phrase category modules</A>
<UL>
<LI><A HREF="#toc14">Noun</A>
<LI><A HREF="#toc15">Verb</A>
<LI><A HREF="#toc16">Adjective</A>
<LI><A HREF="#toc13">Noun</A>
<LI><A HREF="#toc14">Verb</A>
<LI><A HREF="#toc15">Adjective</A>
</UL>
<LI><A HREF="#toc17">Lexicon extension</A>
<LI><A HREF="#toc16">Lexicon extension</A>
<UL>
<LI><A HREF="#toc18">The irregularity lexicon</A>
<LI><A HREF="#toc19">Lexicon extraction from a word list</A>
<LI><A HREF="#toc20">Lexicon extraction from raw text data</A>
<LI><A HREF="#toc21">Extending the resource grammar API</A>
<LI><A HREF="#toc17">The irregularity lexicon</A>
<LI><A HREF="#toc18">Lexicon extraction from a word list</A>
<LI><A HREF="#toc19">Lexicon extraction from raw text data</A>
<LI><A HREF="#toc20">Extending the resource grammar API</A>
</UL>
<LI><A HREF="#toc22">Writing an instance of parametrized resource grammar implementation</A>
<LI><A HREF="#toc23">Parametrizing a resource grammar implementation</A>
<LI><A HREF="#toc21">Writing an instance of parametrized resource grammar implementation</A>
<LI><A HREF="#toc22">Parametrizing a resource grammar implementation</A>
</UL>
</UL>
<P></P>
<HR NOSHADE SIZE=1>
<P></P>
<P>
Resource grammar HOWTO
Author: Aarne Ranta &lt;aarne (at) cs.chalmers.se&gt;
Last update: Wed Jan 4 11:29:41 2006
</P>
<A NAME="toc1"></A>
<H1>HOW TO WRITE A RESOURCE GRAMMAR</H1>
<P>
<A HREF="http://www.cs.chalmers.se/~aarne/">Aarne Ranta</A>
</P>
<P>
20060104
</P>
<P>
The purpose of this document is to tell how to implement the GF
resource grammar API for a new language. We will <I>not</I> cover how
to use the resource grammar, nor how to change the API. But we
@@ -74,7 +62,7 @@ in <A HREF=".."><CODE>GF/lib/resource-1.0/</CODE></A>. See the
<A HREF="../README"><CODE>resource-1.0/README</CODE></A> for
details on how this differs from previous versions.
</P>
<A NAME="toc2"></A>
<A NAME="toc1"></A>
<H2>The resource grammar API</H2>
<P>
The API is divided into a bunch of <CODE>abstract</CODE> modules.
@@ -97,7 +85,7 @@ parent of the top module (<CODE>Lang</CODE> or <CODE>Test</CODE>). The idea
is that you can concentrate on one linguistic aspect at a time, or
also distribute the work among several authors.
</P>
<A NAME="toc3"></A>
<A NAME="toc2"></A>
<H3>Phrase category modules</H3>
<P>
The direct parents of the top could be called <B>phrase category modules</B>,
@@ -120,7 +108,7 @@ one of a small number of different types). Thus we have
<LI><CODE>Phrase</CODE>: construction of the major units of text and speech
</UL>
<A NAME="toc4"></A>
<A NAME="toc3"></A>
<H3>Infrastructure modules</H3>
<P>
Expressions of each phrase category are constructed in the corresponding
@@ -163,7 +151,7 @@ modules:
The full resource API (<CODE>Lang</CODE>) uses <CODE>Tensed</CODE>, whereas the
restricted <CODE>Test</CODE> API uses <CODE>Untensed</CODE>.
</P>
<A NAME="toc5"></A>
<A NAME="toc4"></A>
<H3>Lexical modules</H3>
<P>
What is lexical and what is syntactic is not as clearcut in GF as in
@@ -205,9 +193,9 @@ different languages on the level of a resource grammar. In other words,
application grammars are likely to use the resource in different ways for
different languages.
</P>
<A NAME="toc6"></A>
<A NAME="toc5"></A>
<H2>Phases of the work</H2>
<A NAME="toc7"></A>
<A NAME="toc6"></A>
<H3>Putting up a directory</H3>
<P>
Unless you are writing an instance of a parametrized implementation
@@ -260,21 +248,33 @@ of resource v. 1.0.
This will give you a set of templates out of which the grammar
will grow as you uncomment and modify the files rule by rule.
<P></P>
<LI>In the file <CODE>TestGer.gf</CODE>, uncomment all lines except the list
of inherited modules. Now you can open the grammar in GF:
<LI>In all <CODE>.gf</CODE> files, uncomment the module headers and brackets,
leaving the module bodies commented. Unfortunately, there is no
simple way to do this automatically (or to avoid commenting these
lines in the previous step) - but you uncommenting the first
and the last lines will actually do the job for many of the files.
<P></P>
<LI>Now you can open the grammar <CODE>TestGer</CODE> in GF:
<PRE>
gf TestGer.gf
</PRE>
You will get lots of warnings on missing rules, but the grammar will compile.
<P></P>
<LI>Now you will at all following steps have a valid, but incomplete
<LI>At all following steps you will now have a valid, but incomplete
GF grammar. The GF command
<PRE>
pg -printer=missing
</PRE>
tells you what exactly is missing.
<P></P>
Here is the module structure of <CODE>TestGer</CODE>. It has been simplified by leaving out
the majority of the phrase category modules. Each of them has the same dependencies
as e.g. <CODE>VerbGer</CODE>.
<P></P>
<IMG ALIGN="middle" SRC="German.png" BORDER="0" ALT="">
</OL>
<A NAME="toc8"></A>
<A NAME="toc7"></A>
<H3>The develop-test cycle</H3>
<P>
The real work starts now. The order in which the <CODE>Phrase</CODE> modules
@@ -330,7 +330,14 @@ with the next one. Actually, a suitable subset of <CODE>Noun</CODE>,
<CODE>Verb</CODE>, and <CODE>Adjective</CODE> will lead to a reasonable coverage
very soon, keep you motivated, and reveal errors.
</P>
<A NAME="toc9"></A>
<P>
Here is a <A HREF="../german/log.txt">live log</A> of the actual process of
building the German implementation of resource API v. 1.0.
It is the basis of the more detailed explanations, which will
follow soon. (You will found out that these explanations involve
a rational reconstruction of the live process!)
</P>
<A NAME="toc8"></A>
<H3>Resource modules used</H3>
<P>
These modules will be written by you.
@@ -354,7 +361,7 @@ package.
<LI><CODE>Predefined</CODE>: general-purpose operations with hard-coded definitions
</UL>
<A NAME="toc10"></A>
<A NAME="toc9"></A>
<H3>Morphology and lexicon</H3>
<P>
When the implementation of <CODE>Test</CODE> is complete, it is time to
@@ -434,7 +441,7 @@ These constants are defined in terms of parameter types and constructors
in <CODE>ResGer</CODE> and <CODE>MorphoGer</CODE>, which modules are are not
accessible to the application grammarian.
</P>
<A NAME="toc11"></A>
<A NAME="toc10"></A>
<H3>Lock fields</H3>
<P>
An important difference between <CODE>MorphoGer</CODE> and
@@ -481,7 +488,7 @@ in her hidden definitions of constants in <CODE>Paradigms</CODE>. For instance,
-- mkAdv s = {s = s ; lock_Adv = &lt;&gt;} ;
</PRE>
<P></P>
<A NAME="toc12"></A>
<A NAME="toc11"></A>
<H3>Lexicon construction</H3>
<P>
The lexicon belonging to <CODE>LangGer</CODE> consists of two modules:
@@ -501,17 +508,17 @@ the coverage of the paradigms gets thereby tested and that the
use of the paradigms in <CODE>BasicGer</CODE> gives a good set of examples for
those who want to build new lexica.
</P>
<A NAME="toc13"></A>
<A NAME="toc12"></A>
<H2>Inside phrase category modules</H2>
<A NAME="toc14"></A>
<A NAME="toc13"></A>
<H3>Noun</H3>
<A NAME="toc15"></A>
<A NAME="toc14"></A>
<H3>Verb</H3>
<A NAME="toc16"></A>
<A NAME="toc15"></A>
<H3>Adjective</H3>
<A NAME="toc17"></A>
<A NAME="toc16"></A>
<H2>Lexicon extension</H2>
<A NAME="toc18"></A>
<A NAME="toc17"></A>
<H3>The irregularity lexicon</H3>
<P>
It may be handy to provide a separate module of irregular
@@ -521,7 +528,7 @@ few hundred perhaps. Building such a lexicon separately also
makes it less important to cover <I>everything</I> by the
worst-case paradigms (<CODE>mkV</CODE> etc).
</P>
<A NAME="toc19"></A>
<A NAME="toc18"></A>
<H3>Lexicon extraction from a word list</H3>
<P>
You can often find resources such as lists of
@@ -556,7 +563,7 @@ When using ready-made word lists, you should think about
coyright issues. Ideally, all resource grammar material should
be provided under GNU General Public License.
</P>
<A NAME="toc20"></A>
<A NAME="toc19"></A>
<H3>Lexicon extraction from raw text data</H3>
<P>
This is a cheap technique to build a lexicon of thousands
@@ -564,7 +571,7 @@ of words, if text data is available in digital format.
See the <A HREF="http://www.cs.chalmers.se/~markus/FM/">Functional Morphology</A>
homepage for details.
</P>
<A NAME="toc21"></A>
<A NAME="toc20"></A>
<H3>Extending the resource grammar API</H3>
<P>
Sooner or later it will happen that the resource grammar API
@@ -573,7 +580,7 @@ that it does not include idiomatic expressions in a given language.
The solution then is in the first place to build language-specific
extension modules. This chapter will deal with this issue.
</P>
<A NAME="toc22"></A>
<A NAME="toc21"></A>
<H2>Writing an instance of parametrized resource grammar implementation</H2>
<P>
Above we have looked at how a resource implementation is built by
@@ -591,7 +598,7 @@ use parametrized modules. The advantages are
In this chapter, we will look at an example: adding Portuguese to
the Romance family.
</P>
<A NAME="toc23"></A>
<A NAME="toc22"></A>
<H2>Parametrizing a resource grammar implementation</H2>
<P>
This is the most demanding form of resource grammar writing.
@@ -607,6 +614,6 @@ This chapter will work out an example of how an Estonian grammar
is constructed from the Finnish grammar through parametrization.
</P>
<!-- html code generated by txt2tags 2.0 (http://txt2tags.sf.net) -->
<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) -->
<!-- cmdline: txt2tags -\-toc -thtml Resource-HOWTO.txt -->
</BODY></HTML>

View File

@@ -1,5 +1,4 @@
Resource grammar HOWTO
Resource grammar writing HOWTO
Author: Aarne Ranta <aarne (at) cs.chalmers.se>
Last update: %%date(%c)
@@ -10,23 +9,12 @@ Last update: %%date(%c)
%!target:html
=HOW TO WRITE A RESOURCE GRAMMAR=
[Aarne Ranta http://www.cs.chalmers.se/~aarne/]
%%Date
The purpose of this document is to tell how to implement the GF
resource grammar API for a new language. We will //not// cover how
to use the resource grammar, nor how to change the API. But we
will give some hints how to extend the API.
**Notice**. This document concerns the API v. 1.0 which has not
yet been released. You can find the beginnings of it
in [``GF/lib/resource-1.0/`` ..]. See the
@@ -240,7 +228,11 @@ of resource v. 1.0.
```
tells you what exactly is missing.
Here is the module structure of ``TestGer``. It has been simplified by leaving out
the majority of the phrase category modules. Each of them has the same dependencies
as e.g. ``VerbGer``.
[German.png]
===The develop-test cycle===
@@ -298,6 +290,13 @@ with the next one. Actually, a suitable subset of ``Noun``,
very soon, keep you motivated, and reveal errors.
Here is a [live log ../german/log.txt] of the actual process of
building the German implementation of resource API v. 1.0.
It is the basis of the more detailed explanations, which will
follow soon. (You will found out that these explanations involve
a rational reconstruction of the live process!)
===Resource modules used===
These modules will be written by you.