updated tutorial and resource howto

2006-06-15 23:05:42 +00:00
parent 6065e73738
commit 17c3861c55
2 changed files with 190 additions and 16 deletions
--- a/resource-1.0/doc/Resource-HOWTO.html
+++ b/resource-1.0/doc/Resource-HOWTO.html
@@ -7,9 +7,56 @@
 <P ALIGN="center"><CENTER><H1>Resource grammar writing HOWTO</H1>
 <FONT SIZE="4">
 <I>Author: Aarne Ranta &lt;aarne (at) cs.chalmers.se&gt;</I><BR>
-Last update: Fri May 26 17:36:48 2006
+Last update: Fri Jun 16 00:59:52 2006
 </FONT></CENTER>
 <P></P>
 <HR NOSHADE SIZE=1>
 <P></P>
    <UL>
    <LI><A HREF="#toc1">The resource grammar API</A>
      <UL>
      <LI><A HREF="#toc2">Phrase category modules</A>
      <LI><A HREF="#toc3">Infrastructure modules</A>
      <LI><A HREF="#toc4">Lexical modules</A>
      </UL>
    <LI><A HREF="#toc5">Language-dependent syntax modules</A>
    <LI><A HREF="#toc6">The core of the syntax</A>
      <UL>
      <LI><A HREF="#toc7">Another reduced API</A>
      <LI><A HREF="#toc8">The present-tense fragment</A>
      </UL>
    <LI><A HREF="#toc9">Phases of the work</A>
      <UL>
      <LI><A HREF="#toc10">Putting up a directory</A>
      <LI><A HREF="#toc11">Direction of work</A>
      <LI><A HREF="#toc12">The develop-test cycle</A>
      <LI><A HREF="#toc13">Resource modules used</A>
      <LI><A HREF="#toc14">Morphology and lexicon</A>
      <LI><A HREF="#toc15">Lock fields</A>
      <LI><A HREF="#toc16">Lexicon construction</A>
      </UL>
    <LI><A HREF="#toc17">Inside grammar modules</A>
      <UL>
      <LI><A HREF="#toc18">The category system</A>
      <LI><A HREF="#toc19">Phrase category modules</A>
      <LI><A HREF="#toc20">Resource modules</A>
      <LI><A HREF="#toc21">Lexicon</A>
      </UL>
    <LI><A HREF="#toc22">Lexicon extension</A>
      <UL>
      <LI><A HREF="#toc23">The irregularity lexicon</A>
      <LI><A HREF="#toc24">Lexicon extraction from a word list</A>
      <LI><A HREF="#toc25">Lexicon extraction from raw text data</A>
      <LI><A HREF="#toc26">Extending the resource grammar API</A>
      </UL>
    <LI><A HREF="#toc27">Writing an instance of parametrized resource grammar implementation</A>
    <LI><A HREF="#toc28">Parametrizing a resource grammar implementation</A>
    </UL>
 <P></P>
 <HR NOSHADE SIZE=1>
 <P></P>
 <P>
 The purpose of this document is to tell how to implement the GF
 resource grammar API for a new language. We will <I>not</I> cover how
@@ -17,23 +64,43 @@ to use the resource grammar, nor how to change the API. But we
 will give some hints how to extend the API.
 </P>
 <P>
-<B>Notice</B>. This document concerns the API v. 1.0 which has not
+A manual for using the resource grammar is found in
-yet been released. You can find the current code
+</P>
-in <A HREF=".."><CODE>GF/lib/resource-1.0/</CODE></A>. See the
+<P>
-<A HREF="../README"><CODE>resource-1.0/README</CODE></A> for
+<A HREF="../../../doc/resource.pdf"><CODE>http://www.cs.chalmers.se/~aarne/GF/doc/resource.pdf</CODE></A>.
 </P>
 <P>
 A tutorial on GF, also introducing the idea of resource grammars, is found in
 </P>
 <P>
 <A HREF="../../../doc/tutorial/gf-tutorial2.html"><CODE>http://www.cs.chalmers.se/~aarne/GF/doc/tutorial/gf-tutorial2.html</CODE></A>.
 </P>
 <P>
 This document concerns the API v. 1.0. You can find the current code in 
 </P>
 <P>
 <A HREF=".."><CODE>http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/</CODE></A>
 </P>
 <P>
 See the <A HREF="../README"><CODE>README</CODE></A> for
 details on how this differs from previous versions.
 </P>
 <A NAME="toc1"></A>
 <H2>The resource grammar API</H2>
 <P>
 The API is divided into a bunch of <CODE>abstract</CODE> modules.
 The following figure gives the dependencies of these modules.
 </P>
 <P>
-<IMG ALIGN="left" SRC="Lang.png" BORDER="0" ALT=""> 
+<IMG ALIGN="left" SRC="Grammar.png" BORDER="0" ALT=""> 
 </P>
 <P>
-The module structure is rather flat: almost every module is a direct
+Thus the API consists of a grammar and a lexicon, which is
-parent of the top module <CODE>Lang</CODE>. The idea
+provided for test purposes.
 </P>
 <P>
 The module structure is rather flat: most modules are direct
 parents of <CODE>Grammar</CODE>. The idea
 is that you can concentrate on one linguistic aspect at a time, or
 also distribute the work among several authors. The module <CODE>Cat</CODE>
 defines the "glue" that ties the aspects together - a type system
@@ -41,6 +108,7 @@ to which all the other modules conform, so that e.g. <CODE>NP</CODE> means
 the same thing in those modules that use <CODE>NP</CODE>s and those that
 constructs them.
 </P>
 <A NAME="toc2"></A>
 <H3>Phrase category modules</H3>
 <P>
 The direct parents of the top will be called <B>phrase category modules</B>,
@@ -65,6 +133,7 @@ one of a small number of different types). Thus we have
 <LI><CODE>Idiom</CODE>: idiomatic phrases such as existentials
 </UL>
 <A NAME="toc3"></A>
 <H3>Infrastructure modules</H3>
 <P>
 Expressions of each phrase category are constructed in the corresponding
@@ -93,6 +162,7 @@ can skip the <CODE>lincat</CODE> definition of a category and use the default
 <CODE>{s : Str}</CODE> until you need to change it to something else. In
 English, for instance, many categories do have this linearization type.
 </P>
 <A NAME="toc4"></A>
 <H3>Lexical modules</H3>
 <P>
 What is lexical and what is syntactic is not as clearcut in GF as in
@@ -129,6 +199,45 @@ different languages on the level of a resource grammar. In other words,
 application grammars are likely to use the resource in different ways for
 different languages.
 </P>
 <A NAME="toc5"></A>
 <H2>Language-dependent syntax modules</H2>
 <P>
 In addition to the common API, there is room for language-dependent extensions
 of the resource. The top level of each languages looks as follows (with English as example):
 </P>
 <PRE>
    abstract English = Grammar, ExtraEngAbs, DictEngAbs
 </PRE>
 <P>
 where <CODE>ExtraEngAbs</CODE> is a collection of syntactic structures specific to English,
 and <CODE>DictEngAbs</CODE> is an English dictionary 
 (at the moment, it consists of <CODE>IrregEngAbs</CODE>,
 the irregular verbs of English). Each of these language-specific grammars has 
 the potential to grow into a full-scale grammar of the language. These grammar
 can also be used as libraries, but the possibility of using functors is lost.
 </P>
 <P>
 To give a better overview of language-specific structures, 
 modules like <CODE>ExtraEngAbs</CODE>
 are built from a language-independent module <CODE>ExtraAbs</CODE> 
 by restricted inheritance:
 </P>
 <PRE>
    abstract ExtraEngAbs = Extra [f,g,...]
 </PRE>
 <P>
 Thus any category and function in <CODE>Extra</CODE> may be shared by a subset of all
 languages. One can see this set-up as a matrix, which tells 
 what <CODE>Extra</CODE> structures
 are implemented in what languages. For the common API in <CODE>Grammar</CODE>, the matrix
 is filled with 1's (everything is implemented in every language).
 </P>
 <P>
 In a minimal resource grammar implementation, the language-dependent
 extensions are just empty modules, but it is good to provide them for
 the sake of uniformity.
 </P>
 <A NAME="toc6"></A>
 <H2>The core of the syntax</H2>
 <P>
 Among all categories and functions, a handful are 
@@ -153,6 +262,7 @@ rules relate the categories to each other. It is intended to be a
 first approximation when designing the parameter system of a new
 language. 
 </P>
 <A NAME="toc7"></A>
 <H3>Another reduced API</H3>
 <P>
 If you want to experiment with a small subset of the resource API first, 
@@ -161,6 +271,7 @@ try out the module
 explained in the
 <A HREF="http://www.cs.chalmers.se/~aarne/GF/doc/tutorial/gf-tutorial2.html">GF Tutorial</A>.
 </P>
 <A NAME="toc8"></A>
 <H3>The present-tense fragment</H3>
 <P>
 Some lines in the resource library are suffixed with the comment
@@ -176,7 +287,9 @@ implementation. To compile a grammar with present-tense-only, use
    i -preproc=GF/lib/resource-1.0/mkPresent LangGer.gf
 </PRE>
 <P></P>
 <A NAME="toc9"></A>
 <H2>Phases of the work</H2>
 <A NAME="toc10"></A>
 <H3>Putting up a directory</H3>
 <P>
 Unless you are writing an instance of a parametrized implementation
@@ -262,6 +375,7 @@ as e.g. <CODE>VerbGer</CODE>.
 <P>
 <IMG ALIGN="middle" SRC="German.png" BORDER="0" ALT="">
 </P>
 <A NAME="toc11"></A>
 <H3>Direction of work</H3>
 <P>
 The real work starts now. There are many ways to proceed, the main ones being
@@ -360,6 +474,7 @@ and dependences there are in your language, and you can now produce very
 much in the order you please. 
 </OL>
 <A NAME="toc12"></A>
 <H3>The develop-test cycle</H3>
 <P>
 The following develop-test cycle will
@@ -416,6 +531,7 @@ follow soon. (You will found out that these explanations involve
 a rational reconstruction of the live process! Among other things, the
 API was changed during the actual process to make it more intuitive.)
 </P>
 <A NAME="toc13"></A>
 <H3>Resource modules used</H3>
 <P>
 These modules will be written by you.
@@ -472,6 +588,7 @@ almost everything. This led in practice to the duplication of almost
 all code on the <CODE>lin</CODE> and <CODE>oper</CODE> levels, and made the code
 hard to understand and maintain.
 </P>
 <A NAME="toc14"></A>
 <H3>Morphology and lexicon</H3>
 <P>
 The paradigms needed to implement
@@ -542,6 +659,7 @@ These constants are defined in terms of parameter types and constructors
 in <CODE>ResGer</CODE> and <CODE>MorphoGer</CODE>, which modules are not
 visible to the application grammarian.
 </P>
 <A NAME="toc15"></A>
 <H3>Lock fields</H3>
 <P>
 An important difference between <CODE>MorphoGer</CODE> and
@@ -588,6 +706,7 @@ in her hidden definitions of constants in <CODE>Paradigms</CODE>. For instance,
    -- mkAdv s = {s = s ; lock_Adv = &lt;&gt;} ;
 </PRE>
 <P></P>
 <A NAME="toc16"></A>
 <H3>Lexicon construction</H3>
 <P>
 The lexicon belonging to <CODE>LangGer</CODE> consists of two modules:
@@ -607,17 +726,20 @@ the coverage of the paradigms gets thereby tested and that the
 use of the paradigms in <CODE>LexiconGer</CODE> gives a good set of examples for
 those who want to build new lexica.
 </P>
 <A NAME="toc17"></A>
 <H2>Inside grammar modules</H2>
 <P>
 Detailed implementation tricks
 are found in the comments of each module.
 </P>
 <A NAME="toc18"></A>
 <H3>The category system</H3>
 <UL>
 <LI><A HREF="gfdoc/Common.html">Common</A>, <A HREF="../common/CommonX.gf">CommonX</A>
 <LI><A HREF="gfdoc/Cat.html">Cat</A>, <A HREF="gfdoc/CatGer.gf">CatGer</A>
 </UL>
 <A NAME="toc19"></A>
 <H3>Phrase category modules</H3>
 <UL>
 <LI><A HREF="gfdoc/Noun.html">Noun</A>, <A HREF="../german/NounGer.gf">NounGer</A>
@@ -635,6 +757,7 @@ are found in the comments of each module.
 <LI><A HREF="gfdoc/Lang.html">Lang</A>, <A HREF="../german/LangGer.gf">LangGer</A>
 </UL>
 <A NAME="toc20"></A>
 <H3>Resource modules</H3>
 <UL>
 <LI><A HREF="../german/ResGer.gf">ResGer</A>
@@ -642,13 +765,16 @@ are found in the comments of each module.
 <LI><A HREF="gfdoc/ParadigmsGer.html">ParadigmsGer</A>, <A HREF="../german/ParadigmsGer.gf">ParadigmsGer.gf</A>
 </UL>
 <A NAME="toc21"></A>
 <H3>Lexicon</H3>
 <UL>
 <LI><A HREF="gfdoc/Structural.html">Structural</A>, <A HREF="../german/StructuralGer.gf">StructuralGer</A>
 <LI><A HREF="gfdoc/Lexicon.html">Lexicon</A>, <A HREF="../german/LexiconGer.gf">LexiconGer</A>
 </UL>
 <A NAME="toc22"></A>
 <H2>Lexicon extension</H2>
 <A NAME="toc23"></A>
 <H3>The irregularity lexicon</H3>
 <P>
 It may be handy to provide a separate module of irregular
@@ -658,6 +784,7 @@ few hundred perhaps. Building such a lexicon separately also
 makes it less important to cover <I>everything</I> by the
 worst-case paradigms (<CODE>mkV</CODE> etc).
 </P>
 <A NAME="toc24"></A>
 <H3>Lexicon extraction from a word list</H3>
 <P>
 You can often find resources such as lists of 
@@ -692,6 +819,7 @@ When using ready-made word lists, you should think about
 coyright issues. Ideally, all resource grammar material should
 be provided under GNU General Public License.
 </P>
 <A NAME="toc25"></A>
 <H3>Lexicon extraction from raw text data</H3>
 <P>
 This is a cheap technique to build a lexicon of thousands
@@ -699,6 +827,7 @@ of words, if text data is available in digital format.
 See the <A HREF="http://www.cs.chalmers.se/~markus/FM/">Functional Morphology</A> 
 homepage for details.
 </P>
 <A NAME="toc26"></A>
 <H3>Extending the resource grammar API</H3>
 <P>
 Sooner or later it will happen that the resource grammar API
@@ -707,6 +836,7 @@ that it does not include idiomatic expressions in a given language.
 The solution then is in the first place to build language-specific
 extension modules. This chapter will deal with this issue (to be completed).
 </P>
 <A NAME="toc27"></A>
 <H2>Writing an instance of parametrized resource grammar implementation</H2>
 <P>
 Above we have looked at how a resource implementation is built by
@@ -726,6 +856,7 @@ the Romance family (to be completed). Here is a set of
 <A HREF="http://www.cs.chalmers.se/~aarne/geocal2006.pdf">slides</A>
 on the topic.
 </P>
 <A NAME="toc28"></A>
 <H2>Parametrizing a resource grammar implementation</H2>
 <P>
 This is the most demanding form of resource grammar writing.
@@ -742,5 +873,5 @@ is constructed from the Finnish grammar through parametrization.
 </P>
 <!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) -->
-<!-- cmdline: txt2tags Resource-HOWTO.txt -->
+<!-- cmdline: txt2tags -\-toc -thtml Resource-HOWTO.txt -->
 </BODY></HTML>
--- a/resource-1.0/doc/Resource-HOWTO.txt
+++ b/resource-1.0/doc/Resource-HOWTO.txt
@@ -14,11 +14,19 @@ resource grammar API for a new language. We will //not// cover how
 to use the resource grammar, nor how to change the API. But we
 will give some hints how to extend the API.
 A manual for using the resource grammar is found in
-**Notice**. This document concerns the API v. 1.0 which has not
+[``http://www.cs.chalmers.se/~aarne/GF/doc/resource.pdf`` http://www.cs.chalmers.se/~aarne/GF/doc/resource.pdf].
-yet been released. You can find the current code
+
-in [``GF/lib/resource-1.0/`` ..]. See the
+A tutorial on GF, also introducing the idea of resource grammars, is found in
-[``resource-1.0/README`` ../README] for
+
 [``http://www.cs.chalmers.se/~aarne/GF/doc/tutorial/gf-tutorial2.html`` ../../../doc/tutorial/gf-tutorial2.html].
 This document concerns the API v. 1.0. You can find the current code in 
 [``http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/`` ..]
 See the [``README`` ../README] for
 details on how this differs from previous versions.
@@ -28,10 +36,13 @@ details on how this differs from previous versions.
 The API is divided into a bunch of ``abstract`` modules.
 The following figure gives the dependencies of these modules.
-[Lang.png] 
+[Grammar.png] 
-The module structure is rather flat: almost every module is a direct
+Thus the API consists of a grammar and a lexicon, which is
-parent of the top module ``Lang``. The idea
+provided for test purposes.
 The module structure is rather flat: most modules are direct
 parents of ``Grammar``. The idea
 is that you can concentrate on one linguistic aspect at a time, or
 also distribute the work among several authors. The module ``Cat``
 defines the "glue" that ties the aspects together - a type system
@@ -127,6 +138,38 @@ application grammars are likely to use the resource in different ways for
 different languages.
 ==Language-dependent syntax modules==
 In addition to the common API, there is room for language-dependent extensions
 of the resource. The top level of each languages looks as follows (with English as example):
 ```
  abstract English = Grammar, ExtraEngAbs, DictEngAbs
 ```
 where ``ExtraEngAbs`` is a collection of syntactic structures specific to English,
 and ``DictEngAbs`` is an English dictionary 
 (at the moment, it consists of ``IrregEngAbs``,
 the irregular verbs of English). Each of these language-specific grammars has 
 the potential to grow into a full-scale grammar of the language. These grammar
 can also be used as libraries, but the possibility of using functors is lost.
 To give a better overview of language-specific structures, 
 modules like ``ExtraEngAbs``
 are built from a language-independent module ``ExtraAbs`` 
 by restricted inheritance:
 ```
  abstract ExtraEngAbs = Extra [f,g,...]
 ```
 Thus any category and function in ``Extra`` may be shared by a subset of all
 languages. One can see this set-up as a matrix, which tells 
 what ``Extra`` structures
 are implemented in what languages. For the common API in ``Grammar``, the matrix
 is filled with 1's (everything is implemented in every language).
 In a minimal resource grammar implementation, the language-dependent
 extensions are just empty modules, but it is good to provide them for
 the sake of uniformity.
 ==The core of the syntax==
 Among all categories and functions, a handful are