mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-09 04:59:31 -06:00
HOWTO update
This commit is contained in:
BIN
lib/resource-1.0/doc/German.png
Normal file
BIN
lib/resource-1.0/doc/German.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 11 KiB |
@@ -2,66 +2,54 @@
|
||||
<HTML>
|
||||
<HEAD>
|
||||
<META NAME="generator" CONTENT="http://txt2tags.sf.net">
|
||||
<TITLE>Resource grammar writing HOWTO</TITLE>
|
||||
</HEAD><BODY BGCOLOR="white" TEXT="black">
|
||||
<P ALIGN="center"><CENTER><H1>Resource grammar writing HOWTO</H1>
|
||||
<FONT SIZE="4">
|
||||
<I>Author: Aarne Ranta <aarne (at) cs.chalmers.se></I><BR>
|
||||
Last update: Thu Jan 5 23:19:40 2006
|
||||
</FONT></CENTER>
|
||||
|
||||
<P></P>
|
||||
<HR NOSHADE SIZE=1>
|
||||
<P></P>
|
||||
<UL>
|
||||
<LI><A HREF="#toc1">HOW TO WRITE A RESOURCE GRAMMAR</A>
|
||||
<UL>
|
||||
<LI><A HREF="#toc2">The resource grammar API</A>
|
||||
<LI><A HREF="#toc1">The resource grammar API</A>
|
||||
<UL>
|
||||
<LI><A HREF="#toc3">Phrase category modules</A>
|
||||
<LI><A HREF="#toc4">Infrastructure modules</A>
|
||||
<LI><A HREF="#toc5">Lexical modules</A>
|
||||
<LI><A HREF="#toc2">Phrase category modules</A>
|
||||
<LI><A HREF="#toc3">Infrastructure modules</A>
|
||||
<LI><A HREF="#toc4">Lexical modules</A>
|
||||
</UL>
|
||||
<LI><A HREF="#toc6">Phases of the work</A>
|
||||
<LI><A HREF="#toc5">Phases of the work</A>
|
||||
<UL>
|
||||
<LI><A HREF="#toc7">Putting up a directory</A>
|
||||
<LI><A HREF="#toc8">The develop-test cycle</A>
|
||||
<LI><A HREF="#toc9">Resource modules used</A>
|
||||
<LI><A HREF="#toc10">Morphology and lexicon</A>
|
||||
<LI><A HREF="#toc11">Lock fields</A>
|
||||
<LI><A HREF="#toc12">Lexicon construction</A>
|
||||
<LI><A HREF="#toc6">Putting up a directory</A>
|
||||
<LI><A HREF="#toc7">The develop-test cycle</A>
|
||||
<LI><A HREF="#toc8">Resource modules used</A>
|
||||
<LI><A HREF="#toc9">Morphology and lexicon</A>
|
||||
<LI><A HREF="#toc10">Lock fields</A>
|
||||
<LI><A HREF="#toc11">Lexicon construction</A>
|
||||
</UL>
|
||||
<LI><A HREF="#toc13">Inside phrase category modules</A>
|
||||
<LI><A HREF="#toc12">Inside phrase category modules</A>
|
||||
<UL>
|
||||
<LI><A HREF="#toc14">Noun</A>
|
||||
<LI><A HREF="#toc15">Verb</A>
|
||||
<LI><A HREF="#toc16">Adjective</A>
|
||||
<LI><A HREF="#toc13">Noun</A>
|
||||
<LI><A HREF="#toc14">Verb</A>
|
||||
<LI><A HREF="#toc15">Adjective</A>
|
||||
</UL>
|
||||
<LI><A HREF="#toc17">Lexicon extension</A>
|
||||
<LI><A HREF="#toc16">Lexicon extension</A>
|
||||
<UL>
|
||||
<LI><A HREF="#toc18">The irregularity lexicon</A>
|
||||
<LI><A HREF="#toc19">Lexicon extraction from a word list</A>
|
||||
<LI><A HREF="#toc20">Lexicon extraction from raw text data</A>
|
||||
<LI><A HREF="#toc21">Extending the resource grammar API</A>
|
||||
<LI><A HREF="#toc17">The irregularity lexicon</A>
|
||||
<LI><A HREF="#toc18">Lexicon extraction from a word list</A>
|
||||
<LI><A HREF="#toc19">Lexicon extraction from raw text data</A>
|
||||
<LI><A HREF="#toc20">Extending the resource grammar API</A>
|
||||
</UL>
|
||||
<LI><A HREF="#toc22">Writing an instance of parametrized resource grammar implementation</A>
|
||||
<LI><A HREF="#toc23">Parametrizing a resource grammar implementation</A>
|
||||
<LI><A HREF="#toc21">Writing an instance of parametrized resource grammar implementation</A>
|
||||
<LI><A HREF="#toc22">Parametrizing a resource grammar implementation</A>
|
||||
</UL>
|
||||
</UL>
|
||||
|
||||
<P></P>
|
||||
<HR NOSHADE SIZE=1>
|
||||
<P></P>
|
||||
<P>
|
||||
Resource grammar HOWTO
|
||||
Author: Aarne Ranta <aarne (at) cs.chalmers.se>
|
||||
Last update: Wed Jan 4 11:29:41 2006
|
||||
</P>
|
||||
<A NAME="toc1"></A>
|
||||
<H1>HOW TO WRITE A RESOURCE GRAMMAR</H1>
|
||||
<P>
|
||||
<A HREF="http://www.cs.chalmers.se/~aarne/">Aarne Ranta</A>
|
||||
</P>
|
||||
<P>
|
||||
20060104
|
||||
</P>
|
||||
<P>
|
||||
The purpose of this document is to tell how to implement the GF
|
||||
resource grammar API for a new language. We will <I>not</I> cover how
|
||||
to use the resource grammar, nor how to change the API. But we
|
||||
@@ -74,7 +62,7 @@ in <A HREF=".."><CODE>GF/lib/resource-1.0/</CODE></A>. See the
|
||||
<A HREF="../README"><CODE>resource-1.0/README</CODE></A> for
|
||||
details on how this differs from previous versions.
|
||||
</P>
|
||||
<A NAME="toc2"></A>
|
||||
<A NAME="toc1"></A>
|
||||
<H2>The resource grammar API</H2>
|
||||
<P>
|
||||
The API is divided into a bunch of <CODE>abstract</CODE> modules.
|
||||
@@ -97,7 +85,7 @@ parent of the top module (<CODE>Lang</CODE> or <CODE>Test</CODE>). The idea
|
||||
is that you can concentrate on one linguistic aspect at a time, or
|
||||
also distribute the work among several authors.
|
||||
</P>
|
||||
<A NAME="toc3"></A>
|
||||
<A NAME="toc2"></A>
|
||||
<H3>Phrase category modules</H3>
|
||||
<P>
|
||||
The direct parents of the top could be called <B>phrase category modules</B>,
|
||||
@@ -120,7 +108,7 @@ one of a small number of different types). Thus we have
|
||||
<LI><CODE>Phrase</CODE>: construction of the major units of text and speech
|
||||
</UL>
|
||||
|
||||
<A NAME="toc4"></A>
|
||||
<A NAME="toc3"></A>
|
||||
<H3>Infrastructure modules</H3>
|
||||
<P>
|
||||
Expressions of each phrase category are constructed in the corresponding
|
||||
@@ -163,7 +151,7 @@ modules:
|
||||
The full resource API (<CODE>Lang</CODE>) uses <CODE>Tensed</CODE>, whereas the
|
||||
restricted <CODE>Test</CODE> API uses <CODE>Untensed</CODE>.
|
||||
</P>
|
||||
<A NAME="toc5"></A>
|
||||
<A NAME="toc4"></A>
|
||||
<H3>Lexical modules</H3>
|
||||
<P>
|
||||
What is lexical and what is syntactic is not as clearcut in GF as in
|
||||
@@ -205,9 +193,9 @@ different languages on the level of a resource grammar. In other words,
|
||||
application grammars are likely to use the resource in different ways for
|
||||
different languages.
|
||||
</P>
|
||||
<A NAME="toc6"></A>
|
||||
<A NAME="toc5"></A>
|
||||
<H2>Phases of the work</H2>
|
||||
<A NAME="toc7"></A>
|
||||
<A NAME="toc6"></A>
|
||||
<H3>Putting up a directory</H3>
|
||||
<P>
|
||||
Unless you are writing an instance of a parametrized implementation
|
||||
@@ -260,21 +248,33 @@ of resource v. 1.0.
|
||||
This will give you a set of templates out of which the grammar
|
||||
will grow as you uncomment and modify the files rule by rule.
|
||||
<P></P>
|
||||
<LI>In the file <CODE>TestGer.gf</CODE>, uncomment all lines except the list
|
||||
of inherited modules. Now you can open the grammar in GF:
|
||||
<LI>In all <CODE>.gf</CODE> files, uncomment the module headers and brackets,
|
||||
leaving the module bodies commented. Unfortunately, there is no
|
||||
simple way to do this automatically (or to avoid commenting these
|
||||
lines in the previous step) - but you uncommenting the first
|
||||
and the last lines will actually do the job for many of the files.
|
||||
<P></P>
|
||||
<LI>Now you can open the grammar <CODE>TestGer</CODE> in GF:
|
||||
<PRE>
|
||||
gf TestGer.gf
|
||||
</PRE>
|
||||
You will get lots of warnings on missing rules, but the grammar will compile.
|
||||
<P></P>
|
||||
<LI>Now you will at all following steps have a valid, but incomplete
|
||||
<LI>At all following steps you will now have a valid, but incomplete
|
||||
GF grammar. The GF command
|
||||
<PRE>
|
||||
pg -printer=missing
|
||||
</PRE>
|
||||
tells you what exactly is missing.
|
||||
<P></P>
|
||||
Here is the module structure of <CODE>TestGer</CODE>. It has been simplified by leaving out
|
||||
the majority of the phrase category modules. Each of them has the same dependencies
|
||||
as e.g. <CODE>VerbGer</CODE>.
|
||||
<P></P>
|
||||
<IMG ALIGN="middle" SRC="German.png" BORDER="0" ALT="">
|
||||
</OL>
|
||||
|
||||
<A NAME="toc8"></A>
|
||||
<A NAME="toc7"></A>
|
||||
<H3>The develop-test cycle</H3>
|
||||
<P>
|
||||
The real work starts now. The order in which the <CODE>Phrase</CODE> modules
|
||||
@@ -330,7 +330,14 @@ with the next one. Actually, a suitable subset of <CODE>Noun</CODE>,
|
||||
<CODE>Verb</CODE>, and <CODE>Adjective</CODE> will lead to a reasonable coverage
|
||||
very soon, keep you motivated, and reveal errors.
|
||||
</P>
|
||||
<A NAME="toc9"></A>
|
||||
<P>
|
||||
Here is a <A HREF="../german/log.txt">live log</A> of the actual process of
|
||||
building the German implementation of resource API v. 1.0.
|
||||
It is the basis of the more detailed explanations, which will
|
||||
follow soon. (You will found out that these explanations involve
|
||||
a rational reconstruction of the live process!)
|
||||
</P>
|
||||
<A NAME="toc8"></A>
|
||||
<H3>Resource modules used</H3>
|
||||
<P>
|
||||
These modules will be written by you.
|
||||
@@ -354,7 +361,7 @@ package.
|
||||
<LI><CODE>Predefined</CODE>: general-purpose operations with hard-coded definitions
|
||||
</UL>
|
||||
|
||||
<A NAME="toc10"></A>
|
||||
<A NAME="toc9"></A>
|
||||
<H3>Morphology and lexicon</H3>
|
||||
<P>
|
||||
When the implementation of <CODE>Test</CODE> is complete, it is time to
|
||||
@@ -434,7 +441,7 @@ These constants are defined in terms of parameter types and constructors
|
||||
in <CODE>ResGer</CODE> and <CODE>MorphoGer</CODE>, which modules are are not
|
||||
accessible to the application grammarian.
|
||||
</P>
|
||||
<A NAME="toc11"></A>
|
||||
<A NAME="toc10"></A>
|
||||
<H3>Lock fields</H3>
|
||||
<P>
|
||||
An important difference between <CODE>MorphoGer</CODE> and
|
||||
@@ -481,7 +488,7 @@ in her hidden definitions of constants in <CODE>Paradigms</CODE>. For instance,
|
||||
-- mkAdv s = {s = s ; lock_Adv = <>} ;
|
||||
</PRE>
|
||||
<P></P>
|
||||
<A NAME="toc12"></A>
|
||||
<A NAME="toc11"></A>
|
||||
<H3>Lexicon construction</H3>
|
||||
<P>
|
||||
The lexicon belonging to <CODE>LangGer</CODE> consists of two modules:
|
||||
@@ -501,17 +508,17 @@ the coverage of the paradigms gets thereby tested and that the
|
||||
use of the paradigms in <CODE>BasicGer</CODE> gives a good set of examples for
|
||||
those who want to build new lexica.
|
||||
</P>
|
||||
<A NAME="toc13"></A>
|
||||
<A NAME="toc12"></A>
|
||||
<H2>Inside phrase category modules</H2>
|
||||
<A NAME="toc14"></A>
|
||||
<A NAME="toc13"></A>
|
||||
<H3>Noun</H3>
|
||||
<A NAME="toc15"></A>
|
||||
<A NAME="toc14"></A>
|
||||
<H3>Verb</H3>
|
||||
<A NAME="toc16"></A>
|
||||
<A NAME="toc15"></A>
|
||||
<H3>Adjective</H3>
|
||||
<A NAME="toc17"></A>
|
||||
<A NAME="toc16"></A>
|
||||
<H2>Lexicon extension</H2>
|
||||
<A NAME="toc18"></A>
|
||||
<A NAME="toc17"></A>
|
||||
<H3>The irregularity lexicon</H3>
|
||||
<P>
|
||||
It may be handy to provide a separate module of irregular
|
||||
@@ -521,7 +528,7 @@ few hundred perhaps. Building such a lexicon separately also
|
||||
makes it less important to cover <I>everything</I> by the
|
||||
worst-case paradigms (<CODE>mkV</CODE> etc).
|
||||
</P>
|
||||
<A NAME="toc19"></A>
|
||||
<A NAME="toc18"></A>
|
||||
<H3>Lexicon extraction from a word list</H3>
|
||||
<P>
|
||||
You can often find resources such as lists of
|
||||
@@ -556,7 +563,7 @@ When using ready-made word lists, you should think about
|
||||
coyright issues. Ideally, all resource grammar material should
|
||||
be provided under GNU General Public License.
|
||||
</P>
|
||||
<A NAME="toc20"></A>
|
||||
<A NAME="toc19"></A>
|
||||
<H3>Lexicon extraction from raw text data</H3>
|
||||
<P>
|
||||
This is a cheap technique to build a lexicon of thousands
|
||||
@@ -564,7 +571,7 @@ of words, if text data is available in digital format.
|
||||
See the <A HREF="http://www.cs.chalmers.se/~markus/FM/">Functional Morphology</A>
|
||||
homepage for details.
|
||||
</P>
|
||||
<A NAME="toc21"></A>
|
||||
<A NAME="toc20"></A>
|
||||
<H3>Extending the resource grammar API</H3>
|
||||
<P>
|
||||
Sooner or later it will happen that the resource grammar API
|
||||
@@ -573,7 +580,7 @@ that it does not include idiomatic expressions in a given language.
|
||||
The solution then is in the first place to build language-specific
|
||||
extension modules. This chapter will deal with this issue.
|
||||
</P>
|
||||
<A NAME="toc22"></A>
|
||||
<A NAME="toc21"></A>
|
||||
<H2>Writing an instance of parametrized resource grammar implementation</H2>
|
||||
<P>
|
||||
Above we have looked at how a resource implementation is built by
|
||||
@@ -591,7 +598,7 @@ use parametrized modules. The advantages are
|
||||
In this chapter, we will look at an example: adding Portuguese to
|
||||
the Romance family.
|
||||
</P>
|
||||
<A NAME="toc23"></A>
|
||||
<A NAME="toc22"></A>
|
||||
<H2>Parametrizing a resource grammar implementation</H2>
|
||||
<P>
|
||||
This is the most demanding form of resource grammar writing.
|
||||
@@ -607,6 +614,6 @@ This chapter will work out an example of how an Estonian grammar
|
||||
is constructed from the Finnish grammar through parametrization.
|
||||
</P>
|
||||
|
||||
<!-- html code generated by txt2tags 2.0 (http://txt2tags.sf.net) -->
|
||||
<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) -->
|
||||
<!-- cmdline: txt2tags -\-toc -thtml Resource-HOWTO.txt -->
|
||||
</BODY></HTML>
|
||||
|
||||
@@ -1,5 +1,4 @@
|
||||
|
||||
Resource grammar HOWTO
|
||||
Resource grammar writing HOWTO
|
||||
Author: Aarne Ranta <aarne (at) cs.chalmers.se>
|
||||
Last update: %%date(%c)
|
||||
|
||||
@@ -10,23 +9,12 @@ Last update: %%date(%c)
|
||||
%!target:html
|
||||
|
||||
|
||||
=HOW TO WRITE A RESOURCE GRAMMAR=
|
||||
|
||||
|
||||
|
||||
[Aarne Ranta http://www.cs.chalmers.se/~aarne/]
|
||||
|
||||
%%Date
|
||||
|
||||
|
||||
|
||||
The purpose of this document is to tell how to implement the GF
|
||||
resource grammar API for a new language. We will //not// cover how
|
||||
to use the resource grammar, nor how to change the API. But we
|
||||
will give some hints how to extend the API.
|
||||
|
||||
|
||||
|
||||
**Notice**. This document concerns the API v. 1.0 which has not
|
||||
yet been released. You can find the beginnings of it
|
||||
in [``GF/lib/resource-1.0/`` ..]. See the
|
||||
@@ -240,7 +228,11 @@ of resource v. 1.0.
|
||||
```
|
||||
tells you what exactly is missing.
|
||||
|
||||
Here is the module structure of ``TestGer``. It has been simplified by leaving out
|
||||
the majority of the phrase category modules. Each of them has the same dependencies
|
||||
as e.g. ``VerbGer``.
|
||||
|
||||
[German.png]
|
||||
|
||||
|
||||
===The develop-test cycle===
|
||||
@@ -298,6 +290,13 @@ with the next one. Actually, a suitable subset of ``Noun``,
|
||||
very soon, keep you motivated, and reveal errors.
|
||||
|
||||
|
||||
Here is a [live log ../german/log.txt] of the actual process of
|
||||
building the German implementation of resource API v. 1.0.
|
||||
It is the basis of the more detailed explanations, which will
|
||||
follow soon. (You will found out that these explanations involve
|
||||
a rational reconstruction of the live process!)
|
||||
|
||||
|
||||
===Resource modules used===
|
||||
|
||||
These modules will be written by you.
|
||||
|
||||
Reference in New Issue
Block a user