updated tutorial and resource howto

2006-06-15 23:05:42 +00:00
parent 3d243e2596
commit 7d49c0e624
3 changed files with 568 additions and 134 deletions
@@ -7,7 +7,7 @@
 <P ALIGN="center"><CENTER><H1>Grammatical Framework Tutorial</H1>
 <FONT SIZE="4">
 <I>Author: Aarne Ranta &lt;aarne (at) cs.chalmers.se&gt;</I><BR>
-Last update: Wed Jan 25 16:03:03 2006
+Last update: Fri Jun 16 01:02:28 2006
 </FONT></CENTER>

 <P></P>
@@ -34,7 +34,7 @@ Last update: Wed Jan 25 16:03:03 2006
      <LI><A HREF="#toc15">Labelled context-free grammars</A>
      <LI><A HREF="#toc16">The labelled context-free format</A>
      </UL>
-    <LI><A HREF="#toc17">The ``.gf`` grammar format</A>
+    <LI><A HREF="#toc17">The .gf grammar format</A>
      <UL>
      <LI><A HREF="#toc18">Abstract and concrete syntax</A>
      <LI><A HREF="#toc19">Judgement forms</A>
@@ -70,8 +70,8 @@ Last update: Wed Jan 25 16:03:03 2006
      <UL>
      <LI><A HREF="#toc42">Parameters and tables</A>
      <LI><A HREF="#toc43">Inflection tables, paradigms, and ``oper`` definitions</A>
-      <LI><A HREF="#toc44">Worst-case macros and data abstraction</A>
-      <LI><A HREF="#toc45">A system of paradigms using ``Prelude`` operations</A>
+      <LI><A HREF="#toc44">Worst-case functions and data abstraction</A>
+      <LI><A HREF="#toc45">A system of paradigms using Prelude operations</A>
      <LI><A HREF="#toc46">An intelligent noun paradigm using ``case`` expressions</A>
      <LI><A HREF="#toc47">Pattern matching</A>
      <LI><A HREF="#toc48">Morphological ``resource`` modules</A>
@@ -96,34 +96,41 @@ Last update: Wed Jan 25 16:03:03 2006
      <LI><A HREF="#toc63">Prefix-dependent choices</A>
      <LI><A HREF="#toc64">Predefined types and operations</A>
      </UL>
-    <LI><A HREF="#toc65">More features of the module system</A>
+    <LI><A HREF="#toc65">More concepts of abstract syntax</A>
      <UL>
-      <LI><A HREF="#toc66">Interfaces, instances, and functors</A>
-      <LI><A HREF="#toc67">Resource grammars and their reuse</A>
-      <LI><A HREF="#toc68">Restricted inheritance and qualified opening</A>
+      <LI><A HREF="#toc66">GF as a logical framework</A>
+      <LI><A HREF="#toc67">Dependent types</A>
+      <LI><A HREF="#toc68">Higher-order abstract syntax</A>
+      <LI><A HREF="#toc69">Semantic definitions</A>
+      <LI><A HREF="#toc70">List categories</A>
      </UL>
-    <LI><A HREF="#toc69">More concepts of abstract syntax</A>
+    <LI><A HREF="#toc71">More features of the module system</A>
      <UL>
-      <LI><A HREF="#toc70">Dependent types</A>
-      <LI><A HREF="#toc71">Higher-order abstract syntax</A>
-      <LI><A HREF="#toc72">Semantic definitions</A>
-      <LI><A HREF="#toc73">List categories</A>
+      <LI><A HREF="#toc72">Interfaces, instances, and functors</A>
+      <LI><A HREF="#toc73">Resource grammars and their reuse</A>
+      <LI><A HREF="#toc74">Restricted inheritance and qualified opening</A>
      </UL>
-    <LI><A HREF="#toc74">Transfer modules</A>
-    <LI><A HREF="#toc75">Practical issues</A>
+    <LI><A HREF="#toc75">Using the standard resource library</A>
      <UL>
-      <LI><A HREF="#toc76">Lexers and unlexers</A>
-      <LI><A HREF="#toc77">Efficiency of grammars</A>
-      <LI><A HREF="#toc78">Speech input and output</A>
-      <LI><A HREF="#toc79">Multilingual syntax editor</A>
-      <LI><A HREF="#toc80">Interactive Development Environment (IDE)</A>
-      <LI><A HREF="#toc81">Communicating with GF</A>
-      <LI><A HREF="#toc82">Embedded grammars in Haskell, Java, and Prolog</A>
-      <LI><A HREF="#toc83">Alternative input and output grammar formats</A>
+      <LI><A HREF="#toc76">The simplest way</A>
+      <LI><A HREF="#toc77">How to find resource functions</A>
+      <LI><A HREF="#toc78">A functor implementation</A>
      </UL>
-    <LI><A HREF="#toc84">Case studies</A>
+    <LI><A HREF="#toc79">Transfer modules</A>
+    <LI><A HREF="#toc80">Practical issues</A>
      <UL>
-      <LI><A HREF="#toc85">Interfacing formal and natural languages</A>
+      <LI><A HREF="#toc81">Lexers and unlexers</A>
+      <LI><A HREF="#toc82">Efficiency of grammars</A>
+      <LI><A HREF="#toc83">Speech input and output</A>
+      <LI><A HREF="#toc84">Multilingual syntax editor</A>
+      <LI><A HREF="#toc85">Interactive Development Environment (IDE)</A>
+      <LI><A HREF="#toc86">Communicating with GF</A>
+      <LI><A HREF="#toc87">Embedded grammars in Haskell, Java, and Prolog</A>
+      <LI><A HREF="#toc88">Alternative input and output grammar formats</A>
+      </UL>
+    <LI><A HREF="#toc89">Case studies</A>
+      <UL>
+      <LI><A HREF="#toc90">Interfacing formal and natural languages</A>
      </UL>
    </UL>

@@ -222,7 +229,8 @@ These grammars can be used as <B>libraries</B> to define application grammars.
 In this way, it is possible to write a high-quality grammar without
 knowing about linguistics: in general, to write an application grammar
 by using the resource library just requires practical knowledge of
-the target language.
+the target language. and all theoretical knowledge about its grammar
+is given by the libraries.
 </P>
 <A NAME="toc4"></A>
 <H3>Who is this tutorial for</H3>
@@ -258,9 +266,10 @@ notation (also known as BNF). The BNF format is often a good
 starting point for GF grammar development, because it is
 simple and widely used. However, the BNF format is not
 good for multilingual grammars. While it is possible to
-translate the words contained in a BNF grammar to another
-language, proper translation usually involves more, e.g.
-changing the word order in
+"translate" by just changing the words contained in a 
+BNF grammar to words of some other
+language, proper translation usually involves more.
+For instance, the order of words may have to be changed:
 </P>
 <PRE>
  Italian cheese ===&gt; formaggio italiano
@@ -279,14 +288,14 @@ Italian adjectives usually have four forms where English
 has just one:
 </P>
 <PRE>
-    delicious (wine | wines | pizza | pizzas)
+    delicious (wine, wines, pizza, pizzas)
    vino delizioso, vini deliziosi, pizza deliziosa, pizze deliziose
 </PRE>
 <P>
 The <B>morphology</B> of a language describes the
 forms of its words. While the complete description of morphology
-belongs to resource grammars, the tutorial will explain the
-main programming concepts involved. This will moreover
+belongs to resource grammars, this tutorial will explain the
+programming concepts involved in morphology. This will moreover
 make it possible to grow the fragment covered by the food example.
 The tutorial will in fact build a toy resource grammar in order
 to illustrate the module structure of library-based application
@@ -584,7 +593,7 @@ a sentence but a sequence of ten sentences.
 <H3>Labelled context-free grammars</H3>
 <P>
 The syntax trees returned by GF's parser in the previous examples
-are not so nice to look at. The identifiers of form <CODE>Mks</CODE>
+are not so nice to look at. The identifiers that form the tree
 are <B>labels</B> of the BNF rules. To see which label corresponds to
 which rule, you can use the <CODE>print_grammar = pg</CODE> command
 with the <CODE>printer</CODE> flag set to <CODE>cf</CODE> (which means context-free):
@@ -631,7 +640,7 @@ labels to each rule.
 In files with the suffix <CODE>.cf</CODE>, you can prefix rules with
 labels that you provide yourself - these may be more useful
 than the automatically generated ones. The following is a possible
-labelling of <CODE>paleolithic.cf</CODE> with nicer-looking labels.
+labelling of <CODE>food.cf</CODE> with nicer-looking labels.
 </P>
 <PRE>
    Is.        S       ::= Item "is" Quality ;
@@ -661,7 +670,7 @@ With this grammar, the trees look as follows:
 <IMG ALIGN="middle" SRC="Tree2.png" BORDER="0" ALT="">
 </P>
 <A NAME="toc17"></A>
-<H2>The ``.gf`` grammar format</H2>
+<H2>The .gf grammar format</H2>
 <P>
 To see what there is in GF's shell state when a grammar
 has been imported, you can give the plain command
@@ -696,7 +705,7 @@ A GF grammar consists of two main parts:
 </UL>

 <P>
-The EBNF and CF formats fuse these two things together, but it is possible
+The CF format fuses these two things together, but it is possible
 to take them apart. For instance, the sentence formation rule
 </P>
 <PRE>
@@ -773,7 +782,7 @@ judgement forms:
 <P>
 We return to the precise meanings of these judgement forms later.
 First we will look at how judgements are grouped into modules, and
-show how the paleolithic grammar is
+show how the food grammar is
 expressed by using modules and judgements.
 </P>
 <A NAME="toc20"></A>
@@ -950,7 +959,7 @@ A system with this property is called a <B>multilingual grammar</B>.
 </P>
 <P>
 Multilingual grammars can be used for applications such as
-translation. Let us buid an Italian concrete syntax for
+translation. Let us build an Italian concrete syntax for
 <CODE>Food</CODE> and then test the resulting 
 multilingual grammar.
 </P>
@@ -1179,10 +1188,11 @@ The graph uses
 <LI>square boxes  for concrete modules
 <LI>black-headed arrows for inheritance
 <LI>white-headed arrows for the concrete-of-abstract relation
-<P></P>
-<IMG ALIGN="middle" SRC="Foodmarket.png" BORDER="0" ALT="">
 </UL>

+<P>
+<IMG ALIGN="middle" SRC="Foodmarket.png" BORDER="0" ALT="">
+</P>
 <A NAME="toc34"></A>
 <H2>System commands</H2>
 <P>
@@ -1203,7 +1213,7 @@ shell escape symbol <CODE>!</CODE>. The resulting graph was shown in the previou
 <P>
 The command <CODE>print_multi = pm</CODE> is used for printing the current multilingual
 grammar in various formats, of which the format <CODE>-printer=graph</CODE> just
-shows the module dependencies. Use the <CODE>help</CODE> to see what other formats
+shows the module dependencies. Use <CODE>help</CODE> to see what other formats
 are available:
 </P>
 <PRE>
@@ -1216,9 +1226,9 @@ are available:
 <A NAME="toc36"></A>
 <H3>The golden rule of functional programming</H3>
 <P>
-In comparison to the <CODE>.cf</CODE> format, the <CODE>.gf</CODE> format still looks rather
+In comparison to the <CODE>.cf</CODE> format, the <CODE>.gf</CODE> format looks rather
 verbose, and demands lots more characters to be written. You have probably
-done this by the copy-paste-modify method, which is a standard way to
+done this by the copy-paste-modify method, which is a common way to
 avoid repeating work.
 </P>
 <P>
@@ -1232,8 +1242,8 @@ method. The <B>golden rule of functional programming</B> says that
 <P>
 A function separates the shared parts of different computations from the
 changing parts, parameters. In functional programming languages, such as
-<A HREF="http://www.haskell.org">Haskell</A>, it is possible to share muc more than in
-the languages such as C and Java.
+<A HREF="http://www.haskell.org">Haskell</A>, it is possible to share much more than in
+languages such as C and Java.
 </P>
 <A NAME="toc37"></A>
 <H3>Operation definitions</H3>
@@ -1283,11 +1293,8 @@ strings and records.
    resource StringOper = {
      oper
        SS : Type = {s : Str} ;
-  
        ss : Str -&gt; SS = \x -&gt; {s = x} ;
-  
        cc : SS -&gt; SS -&gt; SS = \x,y -&gt; ss (x.s ++ y.s) ;
-  
        prefix : Str -&gt; SS -&gt; SS = \p,x -&gt; ss (p ++ x.s) ;
    }
 </PRE>
@@ -1433,7 +1440,7 @@ forms of a word are formed.
 </P>
 <P>
 From GF point of view, a paradigm is a function that takes a <B>lemma</B> -
-a string also known as a <B>dictionary form</B> - and returns an inflection
+also known as a <B>dictionary form</B> - and returns an inflection
 table of desired type. Paradigms are not functions in the sense of the
 <CODE>fun</CODE> judgements of abstract syntax (which operate on trees and not
 on strings), but operations defined in <CODE>oper</CODE> judgements.
@@ -1457,13 +1464,13 @@ are written together to form one <B>token</B>. Thus, for instance,
 </PRE>
 <P></P>
 <A NAME="toc44"></A>
-<H3>Worst-case macros and data abstraction</H3>
+<H3>Worst-case functions and data abstraction</H3>
 <P>
 Some English nouns, such as <CODE>mouse</CODE>, are so irregular that
 it makes no sense to see them as instances of a paradigm. Even
 then, it is useful to perform <B>data abstraction</B> from the
 definition of the type <CODE>Noun</CODE>, and introduce a constructor
-operation, a <B>worst-case macro</B> for nouns:
+operation, a <B>worst-case function</B> for nouns:
 </P>
 <PRE>
    oper mkNoun : Str -&gt; Str -&gt; Noun = \x,y -&gt; {
@@ -1490,7 +1497,7 @@ and
 instead of writing the inflection table explicitly.
 </P>
 <P>
-The grammar engineering advantage of worst-case macros is that
+The grammar engineering advantage of worst-case functions is that
 the author of the resource module may change the definitions of
 <CODE>Noun</CODE> and <CODE>mkNoun</CODE>, and still retain the
 interface (i.e. the system of type signatures) that makes it
@@ -1498,7 +1505,7 @@ correct to use these functions in concrete modules. In programming
 terms, <CODE>Noun</CODE> is then treated as an <B>abstract datatype</B>.
 </P>
 <A NAME="toc45"></A>
-<H3>A system of paradigms using ``Prelude`` operations</H3>
+<H3>A system of paradigms using Prelude operations</H3>
 <P>
 In addition to the completely regular noun paradigm <CODE>regNoun</CODE>, 
 some other frequent noun paradigms deserve to be
@@ -1707,7 +1714,7 @@ The rule of subject-verb agreement in English says that the verb
 phrase must be inflected in the number of the subject. This
 means that a noun phrase (functioning as a subject), inherently
 <I>has</I> a number, which it passes to the verb. The verb does not
-<I>have</I> a number, but must be able to receive whatever number the
+<I>have</I> a number, but must be able to <I>receive</I> whatever number the
 subject has. This distinction is nicely represented by the
 different linearization types of <B>noun phrases</B> and <B>verb phrases</B>:
 </P>
@@ -1717,7 +1724,8 @@ different linearization types of <B>noun phrases</B> and <B>verb phrases</B>:
 </PRE>
 <P>
 We say that the number of <CODE>NP</CODE> is an <B>inherent feature</B>,
-whereas the number of  <CODE>NP</CODE> is <B>parametric</B>.
+whereas the number of  <CODE>NP</CODE> is a <B>variable feature</B> (or a
+<B>parametric feature</B>).
 </P>
 <P>
 The agreement rule itself is expressed in the linearization rule of
@@ -1823,7 +1831,7 @@ Here is an example of pattern matching, the paradigm of regular adjectives.
      }
 </PRE>
 <P>
-A constructor can have patterns as arguments. For instance,
+A constructor can be used as a pattern that has patterns as arguments. For instance,
 the adjectival paradigm in which the two singular forms are the same, 
 can be defined
 </P>
@@ -1837,9 +1845,9 @@ can be defined
 <A NAME="toc54"></A>
 <H3>Morphological analysis and morphology quiz</H3>
 <P>
-Even though in GF morphology
-is mostly seen as an auxiliary of syntax, a morphology once defined
-can be used on its own right. The command <CODE>morpho_analyse = ma</CODE>
+Even though morphology is in GF
+mostly used as an auxiliary for syntax, it
+can also be useful on its own right. The command <CODE>morpho_analyse = ma</CODE>
 can be used to read a text and return for each word the analyses that
 it has in the current concrete syntax.
 </P>
@@ -1865,11 +1873,12 @@ the category is set to be something else than <CODE>S</CODE>. For instance,
    Score 0/1
 </PRE>
 <P>
-Finally, a list of morphological exercises and save it in a
+Finally, a list of morphological exercises can be generated
+off-line saved in a
 file for later use, by the command <CODE>morpho_list = ml</CODE>
 </P>
 <PRE>
-    &gt; morpho_list -number=25 -cat=V
+    &gt; morpho_list -number=25 -cat=V | wf exx.txt
 </PRE>
 <P>
 The <CODE>number</CODE> flag gives the number of exercises generated.
@@ -1884,25 +1893,36 @@ a sentence may place the object between the verb and the particle:
 <I>he switched it off</I>.
 </P>
 <P>
-The first of the following judgements defines transitive verbs as
+The following judgement defines transitive verbs as
 <B>discontinuous constituents</B>, i.e. as having a linearization
-type with two strings and not just one. The second judgement
+type with two strings and not just one. 
+</P>
+<PRE>
+    lincat TV = {s : Number =&gt; Str ; part : Str} ;
+</PRE>
+<P>
+This linearization rule
 shows how the constituents are separated by the object in complementization.
 </P>
 <PRE>
-    lincat TV         = {s : Number =&gt; Str ; part : Str} ;
    lin PredTV tv obj = {s = \\n =&gt; tv.s ! n ++ obj.s ++ tv.part} ;
 </PRE>
 <P>
 There is no restriction in the number of discontinuous constituents
 (or other fields) a  <CODE>lincat</CODE> may contain. The only condition is that
 the fields must be of finite types, i.e. built from records, tables,
-parameters, and <CODE>Str</CODE>, and not functions. A mathematical result
+parameters, and <CODE>Str</CODE>, and not functions. 
+</P>
+<P>
+A mathematical result
 about parsing in GF says that the worst-case complexity of parsing
-increases with the number of discontinuous constituents. Moreover,
-the parsing and linearization commands only give reliable results
-for categories whose linearization type has a unique <CODE>Str</CODE> valued
-field labelled <CODE>s</CODE>.
+increases with the number of discontinuous constituents. This is
+potentially a reason to avoid discontinuous constituents.
+Moreover, the parsing and linearization commands only give accurate
+results for categories whose linearization type has a unique <CODE>Str</CODE> 
+valued field labelled <CODE>s</CODE>. Therefore, discontinuous constituents
+are not a good idea in top-level categories accessed by the users
+of a grammar application.
 </P>
 <A NAME="toc56"></A>
 <H2>More constructs for concrete syntax</H2>
@@ -1953,8 +1973,25 @@ can be used e.g. if a word lacks a certain form.
 In general, <CODE>variants</CODE> should be used cautiously. It is not
 recommended for modules aimed to be libraries, because the
 user of the library has no way to choose among the variants.
-Moreover, even though <CODE>variants</CODE> admits lists of any type,
-its semantics for complex types can cause surprises.
+Moreover, <CODE>variants</CODE> is only defined for basic types (<CODE>Str</CODE>
+and parameter types). The grammar compiler will admit
+<CODE>variants</CODE> for any types, but it will push it to the
+level of basic types in a way that may be unwanted.
+For instance, German has two words meaning "car", 
+<I>Wagen</I>, which is Masculine, and <I>Auto</I>, which is Neuter.
+However, if one writes
+</P>
+<PRE>
+    variants {{s = "Wagen" ; g = Masc} ; {s = "Auto" ; g = Neutr}}
+</PRE>
+<P>
+this will compute to
+</P>
+<PRE>
+    {s = variants {"Wagen" ; "Auto"} ; g = variants {Masc ; Neutr}}
+</PRE>
+<P>
+which will also accept erroneous combinations of strings and genders.
 </P>
 <A NAME="toc59"></A>
 <H3>Record extension and subtyping</H3>
@@ -2039,9 +2076,6 @@ possible to write, slightly surprisingly,
 <A NAME="toc62"></A>
 <H3>Regular expression patterns</H3>
 <P>
-(New since 7 January 2006.)
-</P>
-<P>
 To define string operations computed at compile time, such
 as in morphology, it is handy to use regular expression patterns:
 </P>
@@ -2076,7 +2110,6 @@ Another example: English noun plural formation.
      x + "y"                           =&gt; x + "ies" ;
      _                                 =&gt; w + "s"
      } ;
-  
 </PRE>
 <P>
 Semantics: variables are always bound to the <B>first match</B>, which is the first
@@ -2085,8 +2118,10 @@ in the sequence of binding lists <CODE>Match p v</CODE> defined as follows. In t
 </P>
 <PRE>
    Match (p1|p2) v = Match p1 v ++ Match p2 v
-    Match (p1+p2) s = [Match p1 s1 ++ Match p2 s2 | i &lt;- [0..length s], (s1,s2) = splitAt i s]
-    Match p*      s = Match "" s ++ Match p s ++ Match (p + p) s ++ ...
+    Match (p1+p2) s = [Match p1 s1 ++ Match p2 s2 | 
+                         i &lt;- [0..length s], (s1,s2) = splitAt i s]
+    Match p*      s = [[]] if Match "" s ++ Match p s ++ Match (p+p) s ++... /= []
+    Match -p      v = [[]] if Match p v = []
    Match c       v = [[]] if c == v  -- for constant and literal patterns c
    Match x       v = [[(x,v)]]       -- for variable patterns x
    Match x@p     v = [[(x,v)]] + M   if M = Match p v /= []
@@ -2097,14 +2132,18 @@ Examples:
 </P>
 <UL>
 <LI><CODE>x + "e" + y</CODE> matches <CODE>"peter"</CODE> with <CODE>x = "p", y = "ter"</CODE>
-<LI><CODE>x@("foo"*)</CODE> matches any token with <CODE>x = ""</CODE>
-<LI><CODE>x + y@("er"*)</CODE> matches <CODE>"burgerer"</CODE> with <CODE>x = "burg", y = "erer"</CODE>
+<LI><CODE>x + "er"*</CODE> matches <CODE>"burgerer"</CODE> with ``x = "burg"
 </UL>

 <A NAME="toc63"></A>
 <H3>Prefix-dependent choices</H3>
 <P>
-The construct exemplified in
+Sometimes a token has different forms depending on the token
+that follows. An example is the English indefinite article,
+which is <I>an</I> if a vowel follows, <I>a</I> otherwise.
+Which form is chosen can only be decided at run time, i.e.
+when a string is actually build. GF has a special construct for
+such tokens, the <CODE>pre</CODE> construct exemplified in
 </P>
 <PRE>
    oper artIndef : Str = 
@@ -2152,22 +2191,61 @@ they can be used as arguments. For example:
  
    -- e.g. (StreetAddress 10 "Downing Street") : Address
 </PRE>
-<P></P>
+<P>
+The linearization type is <CODE>{s : Str}</CODE> for all these categories.
+</P>
 <A NAME="toc65"></A>
-<H2>More features of the module system</H2>
+<H2>More concepts of abstract syntax</H2>
 <A NAME="toc66"></A>
-<H3>Interfaces, instances, and functors</H3>
+<H3>GF as a logical framework</H3>
+<P>
+In this section, we will show how 
+to encode advanced semantic concepts in an abstract syntax.
+We use concepts inherited from <B>type theory</B>. Type theory
+is the basis of many systems known as <B>logical frameworks</B>, which are
+used for representing mathematical theorems and their proofs on a computer.
+In fact, GF has a logical framework as its proper part:
+this part is the abstract syntax.
+</P>
+<P>
+In a logical framework, the formalization of a mathematical theory
+is a set of type and function declarations. The following is an example
+of such a theory, represented as an <CODE>abstract</CODE> module in GF.
+</P>
+<PRE>
+    abstract Geometry = {
+      cat 
+        Line ; Point ; Circle ;            -- basic types of figures
+        Prop ;                             -- proposition
+      fun
+        Parallel : Line -&gt; Line -&gt; Prop ;  -- x is parallel to y
+        Centre : Circle -&gt; Point ;         -- the centre of c
+      } 
+</PRE>
+<P></P>
 <A NAME="toc67"></A>
+<H3>Dependent types</H3>
+<A NAME="toc68"></A>
+<H3>Higher-order abstract syntax</H3>
+<A NAME="toc69"></A>
+<H3>Semantic definitions</H3>
+<A NAME="toc70"></A>
+<H3>List categories</H3>
+<A NAME="toc71"></A>
+<H2>More features of the module system</H2>
+<A NAME="toc72"></A>
+<H3>Interfaces, instances, and functors</H3>
+<A NAME="toc73"></A>
 <H3>Resource grammars and their reuse</H3>
 <P>
 A resource grammar is a grammar built on linguistic grounds,
 to describe a language rather than a domain.
-The GF resource grammar library contains resource grammars for
+The GF resource grammar library, which contains resource grammars for
 10 languages, is described more closely in the following
 documents:
 </P>
 <UL>
-<LI><A HREF="../../lib/resource/doc/gf-resource.html">Resource library API documentation</A>:
+<LI><A HREF="../../lib/resource-1.0/doc/">Resource library API documentation</A>:
  for application grammarians using the resource.
 <LI><A HREF="../../lib/resource-1.0/doc/Resource-HOWTO.html">Resource writing HOWTO</A>:
  for resource grammarians developing the resource.
@@ -2177,21 +2255,41 @@ documents:
 However, to give a flavour of both using and writing resource grammars,
 we have created a miniature resource, which resides in the
 subdirectory <A HREF="resource"><CODE>resource</CODE></A>. Its API consists of the following
-modules:
+three modules:
 </P>
-<UL>
-<LI><A HREF="resource/Syntax.gf">Syntax</A>: syntactic structures, language-independent
-<LI><A HREF="resource/LexEng.gf">LexEng</A>: lexical paradigms, English
-<LI><A HREF="resource/LexIta.gf">LexIta</A>: lexical paradigms, Italian
-</UL>
-
+<P>
+<A HREF="resource/Syntax.gf">Syntax</A> - syntactic structures, language-independent:
+</P>
+<PRE>
+  
+</PRE>
+<P>
+<A HREF="resource/LexEng.gf">LexEng</A> - lexical paradigms, English:
+</P>
+<PRE>
+  
+</PRE>
+<P>
+<A HREF="resource/LexIta.gf">LexIta</A> - lexical paradigms, Italian:
+</P>
+<PRE>
+  
+</PRE>
+<P></P>
 <P>
 Only these three modules should be <CODE>open</CODE>ed in applications.
 The implementations of the resource are given in the following four modules:
 </P>
+<P>
+<A HREF="resource/MorphoEng.gf">MorphoEng</A>,
+</P>
+<PRE>
+  
+</PRE>
+<P>
+<A HREF="resource/MorphoIta.gf">MorphoIta</A>: low-level morphology
+</P>
 <UL>
-<LI><A HREF="resource/MorphoEng.gf">MorphoEng</A>,
-  <A HREF="resource/MorphoIta.gf">MorphoIta</A>: low-level morphology
 <LI><A HREF="resource/SyntaxEng.gf">SyntaxEng</A>.
  <A HREF="resource/SyntaxIta.gf">SyntaxIta</A>: definitions of syntactic structures
 </UL>
@@ -2210,19 +2308,181 @@ The rest of the modules (black) come from the resource.
 <P>
 <IMG ALIGN="middle" SRC="Multi.png" BORDER="0" ALT="">
 </P>
-<A NAME="toc68"></A>
-<H3>Restricted inheritance and qualified opening</H3>
-<A NAME="toc69"></A>
-<H2>More concepts of abstract syntax</H2>
-<A NAME="toc70"></A>
-<H3>Dependent types</H3>
-<A NAME="toc71"></A>
-<H3>Higher-order abstract syntax</H3>
-<A NAME="toc72"></A>
-<H3>Semantic definitions</H3>
-<A NAME="toc73"></A>
-<H3>List categories</H3>
 <A NAME="toc74"></A>
+<H3>Restricted inheritance and qualified opening</H3>
+<A NAME="toc75"></A>
+<H2>Using the standard resource library</H2>
+<P>
+The example files of this chapter can be found in
+the directory <A HREF="./arithm"><CODE>arithm</CODE></A>.
+</P>
+<A NAME="toc76"></A>
+<H3>The simplest way</H3>
+<P>
+The simplest way is to <CODE>open</CODE> a top-level <CODE>Lang</CODE> module
+and a <CODE>Paradigms</CODE> module: 
+</P>
+<PRE>
+    abstract Foo = ...
+  
+    concrete FooEng = open LangEng, ParadigmsEng in ...
+    concrete FooSwe = open LangSwe, ParadigmsSwe in ...
+</PRE>
+<P>
+Here is an example.
+</P>
+<PRE>
+  abstract Arithm = {
+    cat
+      Prop ;
+      Nat ;
+    fun
+      Zero : Nat ;
+      Succ : Nat -&gt; Nat ;
+      Even : Nat -&gt; Prop ;
+      And  : Prop -&gt; Prop -&gt; Prop ;
+  }
+  
+  --# -path=.:alltenses:prelude
+  
+  concrete ArithmEng of Arithm = open LangEng, ParadigmsEng in {
+    lincat
+      Prop = S ;
+      Nat  = NP ;
+    lin
+      Zero = 
+        UsePN (regPN "zero" nonhuman) ;
+      Succ n = 
+        DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 (regN2 "successor") n) ;
+      Even n = 
+        UseCl TPres ASimul PPos 
+          (PredVP n (UseComp (CompAP (PositA (regA "even"))))) ;
+      And x y = 
+        ConjS and_Conj (BaseS x y) ;
+  
+  }
+  
+  --# -path=.:alltenses:prelude
+  
+  concrete ArithmSwe of Arithm = open LangSwe, ParadigmsSwe in {
+    lincat
+      Prop = S ;
+      Nat  = NP ;
+    lin
+      Zero = 
+        UsePN (regPN "noll" neutrum) ;
+      Succ n = 
+        DetCN (DetSg (SgQuant DefArt) NoOrd) 
+          (ComplN2 (mkN2 (mk2N "efterföljare" "efterföljare") 
+             (mkPreposition "till")) n) ;
+      Even n = 
+        UseCl TPres ASimul PPos 
+          (PredVP n (UseComp (CompAP (PositA (regA "jämn"))))) ;
+      And x y = 
+        ConjS and_Conj (BaseS x y) ;
+  }
+</PRE>
+<P></P>
+<A NAME="toc77"></A>
+<H3>How to find resource functions</H3>
+<P>
+The definitions in this example were found by parsing:
+</P>
+<PRE>
+    &gt; i LangEng.gf
+  
+    -- for Successor:
+    &gt; p -cat=NP -mcfg -parser=topdown "the mother of Paris"
+  
+    -- for Even:
+    &gt; p -cat=S -mcfg -parser=topdown "Paris is old"
+  
+    -- for And:
+    &gt; p -cat=S -mcfg -parser=topdown "Paris is old and I am old"
+</PRE>
+<P>
+The use of parsing can be systematized by <B>example-based grammar writing</B>,
+to which we will return later. 
+</P>
+<A NAME="toc78"></A>
+<H3>A functor implementation</H3>
+<P>
+The interesting thing now is that the
+code in <CODE>ArithmSwe</CODE> is similar to the code in <CODE>ArithmEng</CODE>, except for
+some lexical items ("noll" vs. "zero", "efterföljare" vs. "successor",
+"jämn" vs. "even").  How can we exploit the similarities and
+actually share code between the languages?
+</P>
+<P>
+The solution is to use a functor: an <CODE>incomplete</CODE> module that opens
+an <CODE>abstract</CODE> as an <CODE>interface</CODE>, and then instantiate it to different
+languages that implement the interface. The structure is as follows:
+</P>
+<PRE>
+    abstract Foo ...
+  
+    incomplete concrete FooI = open Lang, Lex in ...
+  
+    concrete FooEng of Foo = FooI with (Lang=LangEng), (Lex=LexEng) ;
+    concrete FooSwe of Foo = FooI with (Lang=LangSwe), (Lex=LexSwe) ;
+</PRE>
+<P>
+where <CODE>Lex</CODE> is an abstract lexicon that includes the vocabulary
+specific to this application:
+</P>
+<PRE>
+    abstract Lex = Cat ** ...
+  
+    concrete LexEng of Lex = CatEng ** open ParadigmsEng in ...
+    concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in ...  
+</PRE>
+<P>
+Here, again, a complete example (<CODE>abstract Arithm</CODE> is as above):
+</P>
+<PRE>
+  incomplete concrete ArithmI of Arithm = open Lang, Lex in {
+    lincat
+      Prop = S ;
+      Nat  = NP ;
+    lin
+      Zero = 
+        UsePN zero_PN ;
+      Succ n = 
+        DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 successor_N2 n) ;
+      Even n = 
+        UseCl TPres ASimul PPos 
+          (PredVP n (UseComp (CompAP (PositA even_A)))) ;
+      And x y = 
+        ConjS and_Conj (BaseS x y) ;
+  }
+  
+  --# -path=.:alltenses:prelude
+  concrete ArithmEng of Arithm = ArithmI with
+    (Lang = LangEng),
+    (Lex = LexEng) ;
+  
+  --# -path=.:alltenses:prelude
+  concrete ArithmSwe of Arithm = ArithmI with
+    (Lang = LangSwe),
+    (Lex = LexSwe) ;
+  
+  abstract Lex = Cat ** {
+    fun
+      zero_PN : PN ;
+      successor_N2 : N2 ;  
+      even_A : A ;
+  }
+  
+  concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in {
+    lin 
+      zero_PN = regPN "noll" neutrum ;
+      successor_N2 = 
+        mkN2 (mk2N "efterföljare" "efterföljare") (mkPreposition "till") ;
+      even_A = regA "jämn" ;
+  }
+</PRE>
+<P></P>
+<A NAME="toc79"></A>
 <H2>Transfer modules</H2>
 <P>
 Transfer means noncompositional tree-transforming operations.
@@ -2241,9 +2501,9 @@ See the
 <A HREF="../transfer.html">transfer language documentation</A>
 for more information.
 </P>
-<A NAME="toc75"></A>
+<A NAME="toc80"></A>
 <H2>Practical issues</H2>
-<A NAME="toc76"></A>
+<A NAME="toc81"></A>
 <H3>Lexers and unlexers</H3>
 <P>
 Lexers and unlexers can be chosen from
@@ -2279,7 +2539,7 @@ Given by <CODE>help -lexer</CODE>, <CODE>help -unlexer</CODE>:
  
 </PRE>
 <P></P>
-<A NAME="toc77"></A>
+<A NAME="toc82"></A>
 <H3>Efficiency of grammars</H3>
 <P>
 Issues:
@@ -2290,7 +2550,7 @@ Issues:
 <LI>parsing efficiency: <CODE>-mcfg</CODE> vs. others
 </UL>

-<A NAME="toc78"></A>
+<A NAME="toc83"></A>
 <H3>Speech input and output</H3>
 <P>
 The<CODE>speak_aloud = sa</CODE> command sends a string to the speech
@@ -2320,7 +2580,7 @@ The method words only for grammars of English.
 Both Flite and ATK are freely available through the links
 above, but they are not distributed together with GF.
 </P>
-<A NAME="toc79"></A>
+<A NAME="toc84"></A>
 <H3>Multilingual syntax editor</H3>
 <P>
 The 
@@ -2337,12 +2597,12 @@ Here is a snapshot of the editor:
 The grammars of the snapshot are from the
 <A HREF="http://www.cs.chalmers.se/~aarne/GF/examples/letter">Letter grammar package</A>.
 </P>
-<A NAME="toc80"></A>
+<A NAME="toc85"></A>
 <H3>Interactive Development Environment (IDE)</H3>
 <P>
 Forthcoming.
 </P>
-<A NAME="toc81"></A>
+<A NAME="toc86"></A>
 <H3>Communicating with GF</H3>
 <P>
 Other processes can communicate with the GF command interpreter,
@@ -2359,7 +2619,7 @@ Thus the most silent way to invoke GF is
 </PRE>
 </UL>

-<A NAME="toc82"></A>
+<A NAME="toc87"></A>
 <H3>Embedded grammars in Haskell, Java, and Prolog</H3>
 <P>
 GF grammars can be used as parts of programs written in the
@@ -2371,15 +2631,15 @@ following languages. The links give more documentation.
 <LI><A HREF="http://www.cs.chalmers.se/~peb/software.html">Prolog</A>
 </UL>

-<A NAME="toc83"></A>
+<A NAME="toc88"></A>
 <H3>Alternative input and output grammar formats</H3>
 <P>
 A summary is given in the following chart of GF grammar compiler phases:
 <IMG ALIGN="middle" SRC="../gf-compiler.png" BORDER="0" ALT="">
 </P>
-<A NAME="toc84"></A>
+<A NAME="toc89"></A>
 <H2>Case studies</H2>
-<A NAME="toc85"></A>
+<A NAME="toc90"></A>
 <H3>Interfacing formal and natural languages</H3>
 <P>
 <A HREF="http://www.cs.chalmers.se/~krijo/thesis/thesisA4.pdf">Formal and Informal Software Specifications</A>,
@@ -2392,6 +2652,6 @@ English and German.
 A simpler example will be explained here.
 </P>

-<!-- html code generated by txt2tags 2.0 (http://txt2tags.sf.net) -->
+<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) -->
 <!-- cmdline: txt2tags -\-toc gf-tutorial2.txt -->
 </BODY></HTML>
@@ -7,9 +7,56 @@
 <P ALIGN="center"><CENTER><H1>Resource grammar writing HOWTO</H1>
 <FONT SIZE="4">
 <I>Author: Aarne Ranta &lt;aarne (at) cs.chalmers.se&gt;</I><BR>
-Last update: Fri May 26 17:36:48 2006
+Last update: Fri Jun 16 00:59:52 2006
 </FONT></CENTER>

+<P></P>
+<HR NOSHADE SIZE=1>
+<P></P>
+    <UL>
+    <LI><A HREF="#toc1">The resource grammar API</A>
+      <UL>
+      <LI><A HREF="#toc2">Phrase category modules</A>
+      <LI><A HREF="#toc3">Infrastructure modules</A>
+      <LI><A HREF="#toc4">Lexical modules</A>
+      </UL>
+    <LI><A HREF="#toc5">Language-dependent syntax modules</A>
+    <LI><A HREF="#toc6">The core of the syntax</A>
+      <UL>
+      <LI><A HREF="#toc7">Another reduced API</A>
+      <LI><A HREF="#toc8">The present-tense fragment</A>
+      </UL>
+    <LI><A HREF="#toc9">Phases of the work</A>
+      <UL>
+      <LI><A HREF="#toc10">Putting up a directory</A>
+      <LI><A HREF="#toc11">Direction of work</A>
+      <LI><A HREF="#toc12">The develop-test cycle</A>
+      <LI><A HREF="#toc13">Resource modules used</A>
+      <LI><A HREF="#toc14">Morphology and lexicon</A>
+      <LI><A HREF="#toc15">Lock fields</A>
+      <LI><A HREF="#toc16">Lexicon construction</A>
+      </UL>
+    <LI><A HREF="#toc17">Inside grammar modules</A>
+      <UL>
+      <LI><A HREF="#toc18">The category system</A>
+      <LI><A HREF="#toc19">Phrase category modules</A>
+      <LI><A HREF="#toc20">Resource modules</A>
+      <LI><A HREF="#toc21">Lexicon</A>
+      </UL>
+    <LI><A HREF="#toc22">Lexicon extension</A>
+      <UL>
+      <LI><A HREF="#toc23">The irregularity lexicon</A>
+      <LI><A HREF="#toc24">Lexicon extraction from a word list</A>
+      <LI><A HREF="#toc25">Lexicon extraction from raw text data</A>
+      <LI><A HREF="#toc26">Extending the resource grammar API</A>
+      </UL>
+    <LI><A HREF="#toc27">Writing an instance of parametrized resource grammar implementation</A>
+    <LI><A HREF="#toc28">Parametrizing a resource grammar implementation</A>
+    </UL>
+
+<P></P>
+<HR NOSHADE SIZE=1>
+<P></P>
 <P>
 The purpose of this document is to tell how to implement the GF
 resource grammar API for a new language. We will <I>not</I> cover how
@@ -17,23 +64,43 @@ to use the resource grammar, nor how to change the API. But we
 will give some hints how to extend the API.
 </P>
 <P>
-<B>Notice</B>. This document concerns the API v. 1.0 which has not
-yet been released. You can find the current code
-in <A HREF=".."><CODE>GF/lib/resource-1.0/</CODE></A>. See the
-<A HREF="../README"><CODE>resource-1.0/README</CODE></A> for
+A manual for using the resource grammar is found in
+</P>
+<P>
+<A HREF="../../../doc/resource.pdf"><CODE>http://www.cs.chalmers.se/~aarne/GF/doc/resource.pdf</CODE></A>.
+</P>
+<P>
+A tutorial on GF, also introducing the idea of resource grammars, is found in
+</P>
+<P>
+<A HREF="../../../doc/tutorial/gf-tutorial2.html"><CODE>http://www.cs.chalmers.se/~aarne/GF/doc/tutorial/gf-tutorial2.html</CODE></A>.
+</P>
+<P>
+This document concerns the API v. 1.0. You can find the current code in 
+</P>
+<P>
+<A HREF=".."><CODE>http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/</CODE></A>
+</P>
+<P>
+See the <A HREF="../README"><CODE>README</CODE></A> for
 details on how this differs from previous versions.
 </P>
+<A NAME="toc1"></A>
 <H2>The resource grammar API</H2>
 <P>
 The API is divided into a bunch of <CODE>abstract</CODE> modules.
 The following figure gives the dependencies of these modules.
 </P>
 <P>
-<IMG ALIGN="left" SRC="Lang.png" BORDER="0" ALT=""> 
+<IMG ALIGN="left" SRC="Grammar.png" BORDER="0" ALT=""> 
 </P>
 <P>
-The module structure is rather flat: almost every module is a direct
-parent of the top module <CODE>Lang</CODE>. The idea
+Thus the API consists of a grammar and a lexicon, which is
+provided for test purposes.
+</P>
+<P>
+The module structure is rather flat: most modules are direct
+parents of <CODE>Grammar</CODE>. The idea
 is that you can concentrate on one linguistic aspect at a time, or
 also distribute the work among several authors. The module <CODE>Cat</CODE>
 defines the "glue" that ties the aspects together - a type system
@@ -41,6 +108,7 @@ to which all the other modules conform, so that e.g. <CODE>NP</CODE> means
 the same thing in those modules that use <CODE>NP</CODE>s and those that
 constructs them.
 </P>
+<A NAME="toc2"></A>
 <H3>Phrase category modules</H3>
 <P>
 The direct parents of the top will be called <B>phrase category modules</B>,
@@ -65,6 +133,7 @@ one of a small number of different types). Thus we have
 <LI><CODE>Idiom</CODE>: idiomatic phrases such as existentials
 </UL>

+<A NAME="toc3"></A>
 <H3>Infrastructure modules</H3>
 <P>
 Expressions of each phrase category are constructed in the corresponding
@@ -93,6 +162,7 @@ can skip the <CODE>lincat</CODE> definition of a category and use the default
 <CODE>{s : Str}</CODE> until you need to change it to something else. In
 English, for instance, many categories do have this linearization type.
 </P>
+<A NAME="toc4"></A>
 <H3>Lexical modules</H3>
 <P>
 What is lexical and what is syntactic is not as clearcut in GF as in
@@ -129,6 +199,45 @@ different languages on the level of a resource grammar. In other words,
 application grammars are likely to use the resource in different ways for
 different languages.
 </P>
+<A NAME="toc5"></A>
+<H2>Language-dependent syntax modules</H2>
+<P>
+In addition to the common API, there is room for language-dependent extensions
+of the resource. The top level of each languages looks as follows (with English as example):
+</P>
+<PRE>
+    abstract English = Grammar, ExtraEngAbs, DictEngAbs
+</PRE>
+<P>
+where <CODE>ExtraEngAbs</CODE> is a collection of syntactic structures specific to English,
+and <CODE>DictEngAbs</CODE> is an English dictionary 
+(at the moment, it consists of <CODE>IrregEngAbs</CODE>,
+the irregular verbs of English). Each of these language-specific grammars has 
+the potential to grow into a full-scale grammar of the language. These grammar
+can also be used as libraries, but the possibility of using functors is lost.
+</P>
+<P>
+To give a better overview of language-specific structures, 
+modules like <CODE>ExtraEngAbs</CODE>
+are built from a language-independent module <CODE>ExtraAbs</CODE> 
+by restricted inheritance:
+</P>
+<PRE>
+    abstract ExtraEngAbs = Extra [f,g,...]
+</PRE>
+<P>
+Thus any category and function in <CODE>Extra</CODE> may be shared by a subset of all
+languages. One can see this set-up as a matrix, which tells 
+what <CODE>Extra</CODE> structures
+are implemented in what languages. For the common API in <CODE>Grammar</CODE>, the matrix
+is filled with 1's (everything is implemented in every language).
+</P>
+<P>
+In a minimal resource grammar implementation, the language-dependent
+extensions are just empty modules, but it is good to provide them for
+the sake of uniformity.
+</P>
+<A NAME="toc6"></A>
 <H2>The core of the syntax</H2>
 <P>
 Among all categories and functions, a handful are 
@@ -153,6 +262,7 @@ rules relate the categories to each other. It is intended to be a
 first approximation when designing the parameter system of a new
 language. 
 </P>
+<A NAME="toc7"></A>
 <H3>Another reduced API</H3>
 <P>
 If you want to experiment with a small subset of the resource API first, 
@@ -161,6 +271,7 @@ try out the module
 explained in the
 <A HREF="http://www.cs.chalmers.se/~aarne/GF/doc/tutorial/gf-tutorial2.html">GF Tutorial</A>.
 </P>
+<A NAME="toc8"></A>
 <H3>The present-tense fragment</H3>
 <P>
 Some lines in the resource library are suffixed with the comment
@@ -176,7 +287,9 @@ implementation. To compile a grammar with present-tense-only, use
    i -preproc=GF/lib/resource-1.0/mkPresent LangGer.gf
 </PRE>
 <P></P>
+<A NAME="toc9"></A>
 <H2>Phases of the work</H2>
+<A NAME="toc10"></A>
 <H3>Putting up a directory</H3>
 <P>
 Unless you are writing an instance of a parametrized implementation
@@ -262,6 +375,7 @@ as e.g. <CODE>VerbGer</CODE>.
 <P>
 <IMG ALIGN="middle" SRC="German.png" BORDER="0" ALT="">
 </P>
+<A NAME="toc11"></A>
 <H3>Direction of work</H3>
 <P>
 The real work starts now. There are many ways to proceed, the main ones being
@@ -360,6 +474,7 @@ and dependences there are in your language, and you can now produce very
 much in the order you please. 
 </OL>

+<A NAME="toc12"></A>
 <H3>The develop-test cycle</H3>
 <P>
 The following develop-test cycle will
@@ -416,6 +531,7 @@ follow soon. (You will found out that these explanations involve
 a rational reconstruction of the live process! Among other things, the
 API was changed during the actual process to make it more intuitive.)
 </P>
+<A NAME="toc13"></A>
 <H3>Resource modules used</H3>
 <P>
 These modules will be written by you.
@@ -472,6 +588,7 @@ almost everything. This led in practice to the duplication of almost
 all code on the <CODE>lin</CODE> and <CODE>oper</CODE> levels, and made the code
 hard to understand and maintain.
 </P>
+<A NAME="toc14"></A>
 <H3>Morphology and lexicon</H3>
 <P>
 The paradigms needed to implement
@@ -542,6 +659,7 @@ These constants are defined in terms of parameter types and constructors
 in <CODE>ResGer</CODE> and <CODE>MorphoGer</CODE>, which modules are not
 visible to the application grammarian.
 </P>
+<A NAME="toc15"></A>
 <H3>Lock fields</H3>
 <P>
 An important difference between <CODE>MorphoGer</CODE> and
@@ -588,6 +706,7 @@ in her hidden definitions of constants in <CODE>Paradigms</CODE>. For instance,
    -- mkAdv s = {s = s ; lock_Adv = &lt;&gt;} ;
 </PRE>
 <P></P>
+<A NAME="toc16"></A>
 <H3>Lexicon construction</H3>
 <P>
 The lexicon belonging to <CODE>LangGer</CODE> consists of two modules:
@@ -607,17 +726,20 @@ the coverage of the paradigms gets thereby tested and that the
 use of the paradigms in <CODE>LexiconGer</CODE> gives a good set of examples for
 those who want to build new lexica.
 </P>
+<A NAME="toc17"></A>
 <H2>Inside grammar modules</H2>
 <P>
 Detailed implementation tricks
 are found in the comments of each module.
 </P>
+<A NAME="toc18"></A>
 <H3>The category system</H3>
 <UL>
 <LI><A HREF="gfdoc/Common.html">Common</A>, <A HREF="../common/CommonX.gf">CommonX</A>
 <LI><A HREF="gfdoc/Cat.html">Cat</A>, <A HREF="gfdoc/CatGer.gf">CatGer</A>
 </UL>

+<A NAME="toc19"></A>
 <H3>Phrase category modules</H3>
 <UL>
 <LI><A HREF="gfdoc/Noun.html">Noun</A>, <A HREF="../german/NounGer.gf">NounGer</A>
@@ -635,6 +757,7 @@ are found in the comments of each module.
 <LI><A HREF="gfdoc/Lang.html">Lang</A>, <A HREF="../german/LangGer.gf">LangGer</A>
 </UL>

+<A NAME="toc20"></A>
 <H3>Resource modules</H3>
 <UL>
 <LI><A HREF="../german/ResGer.gf">ResGer</A>
@@ -642,13 +765,16 @@ are found in the comments of each module.
 <LI><A HREF="gfdoc/ParadigmsGer.html">ParadigmsGer</A>, <A HREF="../german/ParadigmsGer.gf">ParadigmsGer.gf</A>
 </UL>

+<A NAME="toc21"></A>
 <H3>Lexicon</H3>
 <UL>
 <LI><A HREF="gfdoc/Structural.html">Structural</A>, <A HREF="../german/StructuralGer.gf">StructuralGer</A>
 <LI><A HREF="gfdoc/Lexicon.html">Lexicon</A>, <A HREF="../german/LexiconGer.gf">LexiconGer</A>
 </UL>

+<A NAME="toc22"></A>
 <H2>Lexicon extension</H2>
+<A NAME="toc23"></A>
 <H3>The irregularity lexicon</H3>
 <P>
 It may be handy to provide a separate module of irregular
@@ -658,6 +784,7 @@ few hundred perhaps. Building such a lexicon separately also
 makes it less important to cover <I>everything</I> by the
 worst-case paradigms (<CODE>mkV</CODE> etc).
 </P>
+<A NAME="toc24"></A>
 <H3>Lexicon extraction from a word list</H3>
 <P>
 You can often find resources such as lists of 
@@ -692,6 +819,7 @@ When using ready-made word lists, you should think about
 coyright issues. Ideally, all resource grammar material should
 be provided under GNU General Public License.
 </P>
+<A NAME="toc25"></A>
 <H3>Lexicon extraction from raw text data</H3>
 <P>
 This is a cheap technique to build a lexicon of thousands
@@ -699,6 +827,7 @@ of words, if text data is available in digital format.
 See the <A HREF="http://www.cs.chalmers.se/~markus/FM/">Functional Morphology</A> 
 homepage for details.
 </P>
+<A NAME="toc26"></A>
 <H3>Extending the resource grammar API</H3>
 <P>
 Sooner or later it will happen that the resource grammar API
@@ -707,6 +836,7 @@ that it does not include idiomatic expressions in a given language.
 The solution then is in the first place to build language-specific
 extension modules. This chapter will deal with this issue (to be completed).
 </P>
+<A NAME="toc27"></A>
 <H2>Writing an instance of parametrized resource grammar implementation</H2>
 <P>
 Above we have looked at how a resource implementation is built by
@@ -726,6 +856,7 @@ the Romance family (to be completed). Here is a set of
 <A HREF="http://www.cs.chalmers.se/~aarne/geocal2006.pdf">slides</A>
 on the topic.
 </P>
+<A NAME="toc28"></A>
 <H2>Parametrizing a resource grammar implementation</H2>
 <P>
 This is the most demanding form of resource grammar writing.
@@ -742,5 +873,5 @@ is constructed from the Finnish grammar through parametrization.
 </P>

 <!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) -->
-<!-- cmdline: txt2tags Resource-HOWTO.txt -->
+<!-- cmdline: txt2tags -\-toc -thtml Resource-HOWTO.txt -->
 </BODY></HTML>
@@ -14,11 +14,19 @@ resource grammar API for a new language. We will //not// cover how
 to use the resource grammar, nor how to change the API. But we
 will give some hints how to extend the API.

+A manual for using the resource grammar is found in

-**Notice**. This document concerns the API v. 1.0 which has not
-yet been released. You can find the current code
-in [``GF/lib/resource-1.0/`` ..]. See the
-[``resource-1.0/README`` ../README] for
+[``http://www.cs.chalmers.se/~aarne/GF/doc/resource.pdf`` http://www.cs.chalmers.se/~aarne/GF/doc/resource.pdf].
+
+A tutorial on GF, also introducing the idea of resource grammars, is found in
+
+[``http://www.cs.chalmers.se/~aarne/GF/doc/tutorial/gf-tutorial2.html`` ../../../doc/tutorial/gf-tutorial2.html].
+
+This document concerns the API v. 1.0. You can find the current code in 
+
+[``http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/`` ..]
+
+See the [``README`` ../README] for
 details on how this differs from previous versions.


@@ -28,10 +36,13 @@ details on how this differs from previous versions.
 The API is divided into a bunch of ``abstract`` modules.
 The following figure gives the dependencies of these modules.

-[Lang.png] 
+[Grammar.png] 

-The module structure is rather flat: almost every module is a direct
-parent of the top module ``Lang``. The idea
+Thus the API consists of a grammar and a lexicon, which is
+provided for test purposes.
+
+The module structure is rather flat: most modules are direct
+parents of ``Grammar``. The idea
 is that you can concentrate on one linguistic aspect at a time, or
 also distribute the work among several authors. The module ``Cat``
 defines the "glue" that ties the aspects together - a type system
@@ -127,6 +138,38 @@ application grammars are likely to use the resource in different ways for
 different languages.


+==Language-dependent syntax modules==
+
+In addition to the common API, there is room for language-dependent extensions
+of the resource. The top level of each languages looks as follows (with English as example):
+```
+  abstract English = Grammar, ExtraEngAbs, DictEngAbs
+```
+where ``ExtraEngAbs`` is a collection of syntactic structures specific to English,
+and ``DictEngAbs`` is an English dictionary 
+(at the moment, it consists of ``IrregEngAbs``,
+the irregular verbs of English). Each of these language-specific grammars has 
+the potential to grow into a full-scale grammar of the language. These grammar
+can also be used as libraries, but the possibility of using functors is lost.
+
+To give a better overview of language-specific structures, 
+modules like ``ExtraEngAbs``
+are built from a language-independent module ``ExtraAbs`` 
+by restricted inheritance:
+```
+  abstract ExtraEngAbs = Extra [f,g,...]
+```
+Thus any category and function in ``Extra`` may be shared by a subset of all
+languages. One can see this set-up as a matrix, which tells 
+what ``Extra`` structures
+are implemented in what languages. For the common API in ``Grammar``, the matrix
+is filled with 1's (everything is implemented in every language).
+
+In a minimal resource grammar implementation, the language-dependent
+extensions are just empty modules, but it is good to provide them for
+the sake of uniformity.
+
+
 ==The core of the syntax==

 Among all categories and functions, a handful are