resource doc in tutorial

2026-07-08 14:42:46 -06:00 · 2007-05-31 13:43:46 +00:00
parent 74c032b688
commit ad1af38d60
2 changed files with 320 additions and 142 deletions
@@ -1,5 +1,5 @@
-The GF Resource Grammar Library
+The GF Resource Grammar Library, Version 1.2
-Author: Aarne Ranta, Ali El Dada, and Janna Khegai
+Authors: Aarne Ranta, Ali El Dada, Janna Khegai, and Björn Bringert
 Last update: %%date(%c)
 % NOTE: this is a txt2tags file.
@@ -1658,174 +1658,352 @@ All of the following uses of ``mkN`` are easy to resolve:
 %--!
 ==Using the resource grammar library TODO==
-A resource grammar is a grammar built on linguistic grounds,
+===Coverage===
-to describe a language rather than a domain.
+
-The GF resource grammar library, which contains resource grammars for
+The GF Resource Grammar Library contains grammar rules for
-10 languages, is described more closely in the following
+10 languages (in addition, 2 languages are available as incomplete
-documents:
+implementations, and a few more are under construction). Its purpose
- [Resource library API documentation ../../lib/resource-1.0/doc/]:
+is to make these rules available for application programmers,
-  for application grammarians using the resource.
+who can thereby concentrate on the semantic and stylistic
- [Resource writing HOWTO ../../lib/resource-1.0/doc/Resource-HOWTO.html]:
+aspects of their grammars, without having to think about 
-  for resource grammarians developing the resource.
+grammaticality. The targeted level of application grammarians
 is that of a skilled programmer with
 a practical knowledge of the target languages, but without
 theoretical knowledge about their grammars.
 Such a combination of
 skills is typical of programmers who want to localize
 software to new languages.
 The current resource languages are
 - ``Ara``bic
 - ``Cat``alan
 - ``Dan``ish
 - ``Eng``lish
 - ``Fin``nish
 - ``Fre``nch
 - ``Ger``man
 - ``Ita``lian
 - ``Nor``wegian
 - ``Rus``sian
 - ``Spa``nish
 - ``Swe``dish
-===Interfaces, instances, and functors===
+The first three letters (``Eng`` etc) are used in grammar module names.
 The Arabic and Catalan implementations are still incomplete, but 
 enough to be used in many applications.
-===The simplest way===
+To give an example application, consider
-
+music playing devices. In the application,
-The simplest way is to ``open`` a top-level ``Lang`` module
+we may have a semantical category ``Kind``, examples
-and a ``Paradigms`` module: 
+of ``Kind``s being ``Song`` and ``Artist``. In German, for instance, ``Song`` 
 is linearized into the noun "Lied", but knowing this is not
 enough to make the application work, because the noun must be 
 produced in both singular and plural, and in four different
 cases. By using the resource grammar library, it is enough to
 write
 ```
-  abstract Foo = ...
+  lin Song = mkN "Lied" "Lieder" neuter
  concrete FooEng = open LangEng, ParadigmsEng in ...
  concrete FooSwe = open LangSwe, ParadigmsSwe in ...
 ```
-Here is an example.
+and the eight forms are correctly generated. The resource grammar
 library contains a complete set of inflectional paradigms (such as
 ``mkN`` here), enabling the definition of any lexical items.
 The resource grammar library is not only about inflectional paradigms - it
 also has syntax rules. The music player application
 might also want to modify songs with properties, such as "American",
 "old", "good". The German grammar for adjectival modifications is
 particularly complex, because adjectives have to agree in gender,
 number, and case, and also depend on what determiner is used
 ("ein amerikanisches Lied" vs. "das amerikanische Lied"). All this
 variation is taken care of by the resource grammar function
 ```
-abstract Arithm = {
+  fun AdjCN : AP -> CN -> CN
-  cat
+```
-    Prop ;
+(see the tables in the end of this document for the list of all resource grammar
-    Nat ;
+functions). The resource grammar implementation of the rule adding properties
-  fun
+to kinds is
-    Zero : Nat ;
+```
-    Succ : Nat -> Nat ;
+  lin PropKind kind prop = AdjCN prop kind
-    Even : Nat -> Prop ;
+```
-    And  : Prop -> Prop -> Prop ;
+given that 
-}
+```
  lincat Prop = AP
  lincat Kind = CN
 ```
 The resource library API is devided into language-specific 
 and language-independent parts. To put it roughly,
 - the lexicon API is language-specific
 - the syntax API is language-independent
 --# -path=.:alltenses:prelude
-concrete ArithmEng of Arithm = open LangEng, ParadigmsEng in {
+Thus, to render the above example in French instead of German, we need to
-  lincat
+pick a different linearization of ``Song``,
-    Prop = S ;
+```
-    Nat  = NP ;
+  lin Song = mkN "chanson" feminine
-  lin
+```
-    Zero = 
+But to linearize ``PropKind``, we can use the very same rule as in German.
-      UsePN (regPN "zero" nonhuman) ;
+The resource function ``AdjCN`` has different implementations in the two
-    Succ n = 
+languages (e.g. a different word order in French), 
-      DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 (regN2 "successor") n) ;
+but the application programmer need not care about the difference.
    Even n = 
      UseCl TPres ASimul PPos 
        (PredVP n (UseComp (CompAP (PositA (regA "even"))))) ;
    And x y = 
      ConjS and_Conj (BaseS x y) ;
 }
--# -path=.:alltenses:prelude
+===Note on APIs===
-concrete ArithmSwe of Arithm = open LangSwe, ParadigmsSwe in {
+From version 1.1 onwards, the resource library is available via two
-  lincat
+APIs:
-    Prop = S ;
+- original ``fun`` and ``oper`` definitions
-    Nat  = NP ;
+- overloaded ``oper`` definitions
-  lin
+
-    Zero = 
+
-      UsePN (regPN "noll" neutrum) ;
+Introducing overloading in GF version 2.7 has been a success in improving
-    Succ n = 
+the accessibility of libraries. It has also created a layer of abstraction
-      DetCN (DetSg (SgQuant DefArt) NoOrd) 
+between the writers and users of libraries, and thereby makes the library
-        (ComplN2 (mkN2 (mk2N "efterföljare" "efterföljare") 
+easier to modify. We shall therefore use the overloaded API
-           (mkPreposition "till")) n) ;
+in this document. The original function names are mainly interesting
-    Even n = 
+for those who want to write or modify libraries.
-      UseCl TPres ASimul PPos 
+
-        (PredVP n (UseComp (CompAP (PositA (regA "jämn"))))) ;
+
-    And x y = 
+
-      ConjS and_Conj (BaseS x y) ;
+===A complete example===
-}
+
 To summarize the example, and also give a template for a programmer to work on,
 here is the complete implementation of a small system with songs and properties.
 The abstract syntax defines a "domain ontology":
 ```
  abstract Music = {
    cat 
      Kind, 
      Property ;
    fun 
      PropKind : Kind -> Property -> Kind ; 
      Song : Kind ;
      American : Property ;
    }
 ```
 The concrete syntax is defined by a functor (parametrized module),
 independently of language, by opening
 two interfaces: the resource ``Grammar`` and an application lexicon.
 ```
  incomplete concrete MusicI of Music = open Grammar, MusicLex in {
    lincat 
      Kind = CN ;
      Property = AP ;
    lin
      PropKind k p = AdjCN p k ;
      Song = UseN song_N ;
      American = PositA american_A ;
    }
 ```
 The application lexicon ``MusicLex`` has an abstract syntax that extends
 the resource category system ``Cat``.
 ```
  abstract MusicLex = Cat ** {
    fun
      song_N : N ;
      american_A : A ;
    }
 ```
 Each language has its own concrete syntax, which opens the 
 inflectional paradigms module for that language:
 ```
  concrete MusicLexGer of MusicLex = 
      CatGer ** open ParadigmsGer in {
    lin
      song_N = reg2N "Lied" "Lieder" neuter ;
      american_A = regA "amerikanisch" ;
    }
  concrete MusicLexFre of MusicLex = 
      CatFre ** open ParadigmsFre in {
    lin
      song_N = regGenN "chanson" feminine ;
      american_A = regA "américain" ;
    }
 ```
 The top-level ``Music`` grammars are obtained by 
 instantiating the two interfaces of ``MusicI``:
 ```
  concrete MusicGer of Music = MusicI with
    (Grammar = GrammarGer),
    (MusicLex = MusicLexGer) ;
  concrete MusicFre of Music = MusicI with
    (Grammar = GrammarFre),
    (MusicLex = MusicLexFre) ;
 ```
 Both of these files can use the same ``path``, defined as
 ```
  --# -path=.:present:prelude
 ```
 The ``present`` category contains the compiled resources, restricted to
 present tense; ``alltenses`` has the full resources.
 To localize the music player system to a new language, 
 all that is needed is two modules,
 one implementing ``MusicLex`` and the other 
 instantiating ``Music``. The latter is
 completely trivial, whereas the former one involves the choice of correct
 vocabulary and inflectional paradigms. For instance, Finnish is added as follows:
 ```
  concrete MusicLexFin of MusicLex = 
      CatFin ** open ParadigmsFin in {
    lin
      song_N = regN "kappale" ;
      american_A = regA "amerikkalainen" ;
    }
  concrete MusicFin of Music = MusicI with
    (Grammar = GrammarFin),
    (MusicLex = MusicLexFin) ;
 ```
 More work is of course needed if the language-independent linearizations in
 MusicI are not satisfactory for some language. The resource grammar guarantees
 that the linearizations are possible in all languages, in the sense of grammatical,
 but they might of course be inadequate for stylistic reasons. Assume, 
 for the sake of argument, that adjectival modification does not sound good in
 English, but that a relative clause would be preferrable. One can then start as
 before,
 ```
  concrete MusicLexEng of MusicLex = 
      CatEng ** open ParadigmsEng in {
    lin
      song_N = regN "song" ;
      american_A = regA "American" ;
    }
  concrete MusicEng0 of Music = MusicI with
    (Grammar = GrammarEng),
    (MusicLex = MusicLexEng) ;
 ```
 The module ``MusicEng0`` would not be used on the top level, however, but
 another module would be built on top of it, with a restricted import from
 ``MusicEng0``. ``MusicEng`` inherits everything from ``MusicEng0`` 
 except ``PropKind``, and
 gives its own definition of this function:
 ```
  concrete MusicEng of Music = 
      MusicEng0 - [PropKind] ** open GrammarEng in {
    lin
      PropKind k p = 
        RelCN k (UseRCl TPres ASimul PPos 
          (RelVP IdRP (UseComp (CompAP p)))) ;
    }
 ```
 ===To find rules in the resource grammar library===
-===How to find resource functions===
+====Inflection paradigms====
-The definitions in this example were found by parsing:
+Inflection paradigms are defined separately for each language //L//
 in the module ``Paradigms``//L//. To test them, the command 
 ``cc`` (= ``compute_concrete``)
 can be used:
 ```
-  > i LangEng.gf
+  > i -retain german/ParadigmsGer.gf
-  -- for Successor:
+  > cc mkN "Schlange"
-  > p -cat=NP -mcfg -parser=topdown "the mother of Paris"
+  {
-
+    s : Number => Case => Str = table Number {
-  -- for Even:
+      Sg => table Case {
-  > p -cat=S -mcfg -parser=topdown "Paris is old"
+        Nom => "Schlange" ;
-
+        Acc => "Schlange" ;
-  -- for And:
+        Dat => "Schlange" ;
-  > p -cat=S -mcfg -parser=topdown "Paris is old and I am old"
+        Gen => "Schlange"
        } ;
      Pl => table Case {
        Nom => "Schlangen" ;
        Acc => "Schlangen" ;
        Dat => "Schlangen" ;
        Gen => "Schlangen"
        }
      } ;
    g : Gender = Fem
  }
 ```
-The use of parsing can be systematized by **example-based grammar writing**,
+For the sake of convenience, every language implements these five paradigms:
 to which we will return later. 
 ===A functor implementation===
 The interesting thing now is that the
 code in ``ArithmSwe`` is similar to the code in ``ArithmEng``, except for
 some lexical items ("noll" vs. "zero", "efterföljare" vs. "successor",
 "jämn" vs. "even").  How can we exploit the similarities and
 actually share code between the languages?
 The solution is to use a functor: an ``incomplete`` module that opens
 an ``abstract`` as an ``interface``, and then instantiate it to different
 languages that implement the interface. The structure is as follows:
 ```
-  abstract Foo ...
+  oper
-
+    mkN  : Str -> N ;   -- regular nouns
-  incomplete concrete FooI = open Lang, Lex in ...
+    mkA  : Str -> A :   -- regular adjectives
-
+    mkV  : Str -> V ;   -- regular verbs
-  concrete FooEng of Foo = FooI with (Lang=LangEng), (Lex=LexEng) ;
+    mkPN : Str -> PN ;  -- regular proper names
-  concrete FooSwe of Foo = FooI with (Lang=LangSwe), (Lex=LexSwe) ;
+    mkV2 : V   -> V2 ;  -- direct transitive verbs
 ```
-where ``Lex`` is an abstract lexicon that includes the vocabulary
+It is often possible to initialize a lexicon by just using these functions,
-specific to this application:
+and later revise it by using the more involved paradigms. For instance, in
 German we cannot use ``mkN "Lied"`` for ``Song``, because the result would be a
 Masculine noun with the plural form ``"Liede"``. 
 The individual ``Paradigms`` modules
 tell what cases are covered by the regular heuristics.
 As a limiting case, one could even initialize the lexicon for a new language
 by copying the English (or some other already existing) lexicon. This would
 produce language with correct grammar but with content words directly borrowed from
 English - maybe not so strange in certain technical domains.
 ====Syntax rules====
 Syntax rules should be looked for in the module ``Constructors``.
 Below this top-level module exposing overloaded constructors,
 there are around 10 abstract modules, each defining constructors for
 a group of one or more related categories. For instance, the module
 ``Noun`` defines how to construct common nouns, noun phrases, and determiners.
 But these special modules are seldom needed by the users of the library.
 TODO: when are they needed?
 Browsing the libraries is helped by the gfdoc-generated HTML pages,
 whose LaTeX versions are included in the present document.
 ====Browsing by the parser====
 A method alternative to browsing library documentation is
 to use the parser.
 Even though parsing is not an intended end-user application 
 of resource grammars, it is a useful technique for application grammarians
 to browse the library. To find out which resource function implements
 a particular structure, one can just parse a string that exemplifies this 
 structure. For instance, to find out how sentences are built using 
 transitive verbs, write
 ```
-  abstract Lex = Cat ** ...
+  > i english/LangEng.gf
  > p -cat=Cl -fcfg "she loves him"
-  concrete LexEng of Lex = CatEng ** open ParadigmsEng in ...
+  PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron))
  concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in ...  
 ```
-Here, again, a complete example (``abstract Arithm`` is as above):
+The parser returns original constructors, not overloaded ones.
 Parsing with the English resource grammar has an acceptable speed, but
 with most languages it takes just too much resources even to build the
 parser. However, examples parsed in one language can always be linearized into
 other languages:
 ```
-incomplete concrete ArithmI of Arithm = open Lang, Lex in {
+  > i italian/LangIta.gf
  lincat
    Prop = S ;
    Nat  = NP ;
  lin
    Zero = 
      UsePN zero_PN ;
    Succ n = 
      DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 successor_N2 n) ;
    Even n = 
      UseCl TPres ASimul PPos 
        (PredVP n (UseComp (CompAP (PositA even_A)))) ;
    And x y = 
      ConjS and_Conj (BaseS x y) ;
 }
--# -path=.:alltenses:prelude
+  > l PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron))
 concrete ArithmEng of Arithm = ArithmI with
  (Lang = LangEng),
  (Lex = LexEng) ;
--# -path=.:alltenses:prelude
+  lo ama
 concrete ArithmSwe of Arithm = ArithmI with
  (Lang = LangSwe),
  (Lex = LexSwe) ;
 abstract Lex = Cat ** {
  fun
    zero_PN : PN ;
    successor_N2 : N2 ;  
    even_A : A ;
 }
 concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in {
  lin 
    zero_PN = regPN "noll" neutrum ;
    successor_N2 = 
      mkN2 (mk2N "efterföljare" "efterföljare") (mkPreposition "till") ;
    even_A = regA "jämn" ;
 }
 ```
 Therefore, one can use the English parser to write an Italian grammar, and also
 to write a language-independent (incomplete) grammar. One can also parse strings
 that are bizarre in English but the intended way of expression in another language.
 For instance, the phrase for "I am hungry" in Italian is literally "I have hunger".
 This can be built by parsing "I have beer" in LanEng and then writing
 ```
  lin IamHungry = 
    let beer_N = regGenN "fame" feminine 
    in
    PredVP (UsePron i_Pron) (ComplV2 have_V2 
      (DetCN (DetSg MassDet NoOrd) (UseN beer_N))) ;
 ```
 which uses ParadigmsIta.regGenN. 
 ===Restricted inheritance and qualified opening===